How GPUse Billing Works

You only pay for actual GPU usage - not for provisioning, model loading, or idle time.

No Warmup Charges

Model loading is FREE

(~2-5 min)

Per-Second Billing

$0.0002028/second

(~$0.73/hr)

Auto Scale-to-Zero

Stops billing when idle

(after 5-15 min no requests)

When Billing Starts and Stops

PROVISIONING

FREE (~2-5 min)

Container starts and downloads model weights
Model loads into GPU memory
Health checks verify readiness

Billing: $0.00

FIRST REQUEST

BILLING STARTS

First HTTP request hits endpoint
Billing activates immediately
Actual work begins

Billing: $0.0002028/sec

ACTIVE USAGE

BILLING CONTINUES

Workload is processing
Instance stays warm for fast responses
Charges continue during pauses

Billing: $0.0002028/sec

SCALING DOWN

BILLING CONTINUES

No requests for 5-15 minutes
Instance scales to zero
Billing stops when scaled to zero

Billing: $0.0002028/sec

What You're Actually Paying For

Serverless GPU Instance

Component	Per Second	Per Hour	Per Day (8 hrs)
GPUse Total Rate	$0.0002028	~$0.73	~$5.84

Billing Granularity

Minimum: 100 milliseconds (0.1 seconds)
Increment: Per-second billing
Instance Minimum: 1 minute from startup to termination
Grace Period: 5 minutes FREE per project (before any billing)

What Does This Actually Cost?

Quick Inference Job

Transcribe a 30-minute podcast with Whisper

Provisioning: 3 minutes (FREE)
Processing: 5 minutes ($0.06)
$0.0002028 × 300 sec = $0.06
Scaling down: 5-15 min

Total: ~$0.24

Development Session

Test and iterate on a model deployment

Provisioning: 4 minutes (FREE)
Active development: 2 hours
$0.73/hr × 2 = $1.46
Scaling down: 5-15 min

Total: ~$1.64

Production Batch Job

Process 1000 images with SDXL

Provisioning: 5 minutes (FREE)
Processing: 8 hours continuous
$0.73/hr × 8 = $5.84
Scaling down: 5-15 min

Total: ~$6.02

GPUse vs Traditional GPU Providers

Feature	Traditional Providers	GPUse
Charged for provisioning	[✓] Yes (you pay for setup)	[✗] No (FREE warmup period)
Charged for model loading	[✓] Yes (~5-10 min)	[✗] No (FREE until first request)
Billing starts	At deployment	At first request
Billing granularity	Per hour (rounded up)	Per second
Scale to zero	Manual or not available	Automatic (5-15 min idle)
Minimum charge	1 hour	1 minute
Cold start time	Charged	FREE
Idle time cost	Full rate	Auto scales to zero

Bottom Line: Traditional providers charge ~$0.84-1.20/hr from deployment.
GPUse charges $0.73/hr ONLY when you're using the GPU.

You Stop Paying When...

Billing Stops When

Instance completes scaling to zero
You manually stop the instance via stop_compute
Grace period expires without checkout

Billing Continues During

Active processing and requests
Scale-to-zero period (5-15 min after last request)

Frequently Asked Questions

How does billing differ from traditional providers?

Traditional providers charge from the moment you click "deploy". GPUse charges from your first request - meaning provisioning and model loading are FREE.

Why do you charge during the scale-to-zero period?

Serverless GPUs stay warm for a period of time waiting for follow-up requests, which prevents 2-5 minute delays on every request. The instance will scale to zero within 5-15 minutes to save you paying for compute time that you're not using.

What if I forget to stop an instance?

The instance will automatically scale to zero after 5-15 minutes of no requests. You continue to be billed during this scale-down period, and billing stops once the instance becomes fully idle. You can also manually call stop_compute to terminate immediately and avoid the scale-down charges.

How accurate is the scale-to-zero timeline?

It can vary from 5-15 minutes depending on traffic patterns. GPUse does not control this timeline - it's managed by the underlying infrastructure. Scale-to-zero consistently occurs within this range, often completing in a shorter timeline.

Do I pay for failed requests or errors?

Yes - billing is based on instance runtime, not successful requests. If your workload fails, you still pay for the GPU time used. This is why the grace period is valuable for testing.

What happens if I run out of credits?

Your instances will continue running until they scale to zero or are manually stopped. You'll receive low-balance warnings via the MCP tools and dashboard.

Is the grace period per account or per project?

Per account. Use unique project_id values to get 5 minutes FREE for each account.

Why is provisioning FREE if it uses cloud resources?

We believe you should only pay for actual work. Model loading and container startup are infrastructure overhead, not your compute work. This policy sets GPUse apart from traditional providers.

How do I view my billing costs?

We provide detailed billing per GPU instance. Log in to your dashboard to view usage history showing each instance with timestamps and costs. You can filter by active instances or view terminated instances in the history tab. Each entry shows the exact runtime and cost for that GPU instance.

Why do I see irregular billing intervals in my usage history?

Our system checks your GPU usage every 30 seconds, but each check takes a few seconds to complete. This means billing intervals appear slightly irregular in your usage history—you might see entries at 33, 35, or 38-second intervals instead of exactly 30 seconds.

You are always billed for the exact time your GPU was running. Note that billing continues during the scale-to-zero period (up to 15 minutes after your last request) until the instance fully terminates.

Start with 5 Minutes FREE

Test GPUse risk-free with our grace period. No credit card required.

Your agent provisions a GPU, completes the work, then surfaces the checkout link only if you want to continue.

Get Started

Questions about billing? Email support@gpuse.com or log in to view detailed cost breakdowns for each GPU instance via your dashboard.