Sellers on Carousell, Xianyu, and eBay list RTX 3090s as gaming cards and used Mac Studios as consumer electronics. They are actually selling AI inference nodes. The information gap is real: secondhand sellers price by consumer electronics comps, not by AI infrastructure comps. The structural gap is also real: institutional buyers cannot purchase Mac Studios on Carousell — they buy new, in volume, from Apple. Individual buyers access a market that institutions structurally cannot.
Bandwidth is the only metric that matters for autoregressive LLM inference. Every token generated requires loading all model weights through memory once. GB/s beats TFLOPS. Secondhand sellers price by TFLOPS (gaming benchmarks). We buy by GB/s (inference throughput). That is the entire thesis.
Second-hand consumer compute hardware is systematically mispriced relative to its value as AI inference nodes. The mispricing arises from two structural conditions: (1) sellers use consumer electronics pricing frameworks (gaming GPU benchmarks, consumer laptop resale comps) rather than infrastructure pricing frameworks (GB/s of memory bandwidth, GB of VRAM, token throughput), and (2) the buyers who would correctly price this hardware — hyperscalers, AI companies, data centers — are structurally excluded from the secondhand consumer market. They buy new, in bulk, via enterprise contracts. A Mac Studio M2 Ultra 192GB cannot be procured on Carousell at any price by an institution. An individual can buy it there for $1,800–2,200.
Before treating this as an investment, six hidden assumptions must be surfaced and stress-tested.
| Assumption | The Risk | Stress Test |
|---|---|---|
| Local inference demand grows | DeepSeek V3 at $0.89/M blended continues falling. If commodity API cost reaches $0.05/M, local hardware generates less value than electricity costs. | Mitigation: buy below consumer electronics resale floor. Hardware value floor is not inference — it is gaming card / workstation resale. Inference is upside. |
| Sellers remain uninformed | Carousell and eBay pricing algorithms surface AI demand signals; prices correct toward infrastructure comps within 12–18 months. | Partially happening already. RTX 4090 is fully priced by practitioners. The window is not permanent — act on highest-conviction SKUs now. |
| Hardware holds residual value | Next-generation models require 48GB+ minimum VRAM. 24GB cards become worthless for inference overnight. Resale as gaming GPU still possible but at lower price. | Real risk. Weight 48GB+ options more heavily. The 24GB play (RTX 3090) works only if you personally use the inference — not as a pure asset hold. |
| Power costs manageable | Electricity rates spike. At $0.40/kWh, RTX 3090 electricity cost triples. Inference value breakeven shifts dramatically. | Model electricity cost at your actual rate. Singapore/HK rates vary significantly. At $0.30/kWh: still viable for heavy users. At $0.40/kWh: marginal for light users. |
| Exit is liquid | You need to sell 10 RTX 3090s and crater your own price. Secondhand GPU market is thin. Large positions are illiquid by definition. | Hard position limit: max 3 units of any single SKU. This is not a scale trade. It is a personal infrastructure trade. |
| Apple Silicon advantage persists | NVIDIA releases a 96GB consumer card at $1,500. Apple Silicon bandwidth per dollar advantage collapses. | Possible but not imminent. NVIDIA’s 2026 roadmap does not show a consumer 96GB card at that price. Even if announced, secondhand Apple Silicon price decay is slow — resale holds better than NVIDIA for 24 months. |
Understanding where consensus sits determines where edge exists. Consensus-priced assets have no alpha. Below-consensus assets where our model disagrees have edge.
| Group | Awareness | Sentiment | Edge Available? |
|---|---|---|---|
| Mass market (Carousell/eBay sellers) | Unaware | Prices as gaming GPU or consumer electronics | Yes — maximum edge |
| Institutional (hyperscalers, AI labs) | Fully aware | Structurally excluded from secondhand market | Yes — structural exclusion edge |
| r/LocalLLaMA practitioners | Aware | RTX 3090: +10 (neutral/mixed). Mac mini M4 Pro: +30 (bullish) | Partial — see below |
| RTX 4090 buyers | Fully aware | Consensus priced in by practitioners | No edge |
RTX 3090 — Sentiment: Neutral/Mixed (+10). Threads: “RTX 3090 in 2026” and “Talk me out of buying RTX 3090 just for local AI.” The practitioner consensus is ambivalent: 24GB is tight for 70B models, the card is aging, and the RTX 4090 exists. This ambivalence is the edge — the market has not bid up RTX 3090 prices to reflect its actual inference throughput. Our thesis: CONTRARIAN. Neutral sentiment on a card with 936 GB/s bandwidth at $600–900 is the signal.
Mac mini M4 Pro 64GB — Sentiment: Bullish (+30). Thread “Mac Mini looks compelling now... Cheaper than a 5090 and near double the VRAM” had 911 upvotes. This is forming consensus. When r/LocalLLaMA is bullish at +30, the edge is shrinking. Our thesis: ALIGNED but edge is narrowing. The M4 Pro is correctly valued. Wait for M5 release to drop M4 prices, or focus on M2 Ultra used instead.
Autoregressive LLM inference is memory bandwidth-bound, not compute-bound. Every token generated requires loading all model weights through memory exactly once. A 70B parameter model at Q4 quantization occupies approximately 40GB. Generating one token requires moving 40GB of data through the memory system. At 936 GB/s (RTX 3090), that is 40GB ÷ 936 GB/s = 42.7ms latency per token, or approximately 23 tokens/second, ignoring other bottlenecks. TFLOPS determine how fast you compute the attention mechanism — which adds roughly 10–20% to total latency on large batches but is negligible at batch size 1 (single-user inference). Therefore: GB/s is what you buy. TFLOPS are marketing.
No-lose price = (inference value generated × P(thesis holds))
+ (resale value as consumer electronics × P(thesis breaks))
- electricity cost over holding period
- time cost (acquisition + management + exit)
The formula reveals the key insight: when the consumer electronics resale floor is high enough, you do not need the thesis to hold to avoid losing money. The inference value is pure upside on top of an already-safe resale position.
| Parameter | Value | Notes |
|---|---|---|
| VRAM / Bandwidth | 24GB GDDR6X / 936 GB/s | Fits Llama 3.3 70B at Q4_K_M (barely); 8B at Q8 easily |
| Token throughput | 40–50 tok/s (70B Q4); 150+ tok/s (8B) | Single user, batch 1 |
| Used price | $600–900 | eBay / Xianyu; varies by condition |
| Power draw | 350W TDP | $1.26/day at $0.15/kWh, 24h |
| Heavy dev savings (10M tok/day vs Sonnet) | $118.84/day net | $120 API saved − $1.26 electricity |
| Realistic dev savings (1M tok/day) | $10.74/day net | $12 API saved − $1.26 electricity |
| Payback at heavy use | 5 days | At $600 entry, 10M tok/day |
| Payback at realistic use | 56 days | At $600 entry, 1M tok/day |
| Consumer resale floor | $400+ | As gaming GPU; provides downside protection |
| No-lose price | $800 or below | Below this: inference is pure upside on gaming resale value |
The A40 is a 48GB GDDR6 workstation card at 696 GB/s. It runs 70B models comfortably and 405B at aggressive quantization. It is cheap not because it underperforms, but because its passive cooler requires a server chassis or DIY open-air cooling. Workstation builders avoid it. The play: add $50 in fans to an open-air rig and unlock 48GB at 696 GB/s for $1,800–2,200. The A6000 Ampere (identical VRAM, better bandwidth) costs $2,200–3,500. The A40 is the same capacity at a slight bandwidth penalty and $400–1,700 cheaper. If you can handle the passive cooler friction, this is exceptional value per GB.
This is the highest-conviction position in the entire universe. The M2 Ultra Mac Studio with 192GB unified memory launched at $6,000–7,000+. Used it trades at $1,800–2,200 because buyers treat it as a consumer Mac desktop. It is not. It is 192GB of unified memory at 800 GB/s — the memory bandwidth of an A100 40GB ($8,000–14,000 used), at one-fifth the cost, running Llama 3.1 405B at Q4, DeepSeek 67B comfortably, and Llama 3.3 70B at full precision. The structural edge: Apple does not sell 192GB Mac Studios at used prices in volume. An AI lab cannot procure 100 of these from Carousell. You can procure one.
The V100 PCIe 32GB offers 897 GB/s of HBM2 bandwidth at $600–1,200 used. HBM2 is the memory architecture used in data center GPUs; its bandwidth-per-dollar ratio at these prices is exceptional. The caveats are real: PCIe 3.0 limitation, aging Volta architecture with driver quirks, and model scale may outpace 32GB faster than 48GB+ options. This is a speculative position for buyers comfortable with the operational overhead of legacy data center hardware in consumer settings.
The RTX 3090 Ti offers 1,008 GB/s — equal to the RTX 4090 — at $700–1,100. The Ti suffix causes buyers to anchor on gaming benchmarks (“overkill”) rather than inference throughput (“bandwidth leader”). The RTX 4090 at the same bandwidth costs $1,600–2,000 and draws 50W more power. At $700–900, the 3090 Ti is systematically underpriced for inference relative to its bandwidth spec. It often appears in listings beside regular 3090s with only a 10–15% price premium despite 7.7% higher bandwidth. This is a pricing anomaly driven by naming convention confusion.
| Asset | Conviction | Current Price | Model Price (as inference node) | Buy Threshold | Position Size |
|---|---|---|---|---|---|
| RTX 3090 | CONVICTION | $600–900 | $900–1,200 | Below $800 | 2–5% of investable capital |
| RTX 3090 Ti | CONVICTION | $700–1,100 | $1,000–1,400 | Below $900 | 2–5% of investable capital |
| A40 48GB | SPECULATIVE | $1,800–3,500 | $2,500–4,000 | Below $2,200 | 0.5–2% of investable capital |
| M2 Ultra 192GB | HIGH CONVICTION | $1,800–2,200 | $3,000–4,000 | Below $2,000 | 5–10% of investable capital |
| V100 32GB PCIe | SPECULATIVE | $600–1,200 | $800–1,500 | Below $800 | 0.5–2% of investable capital |
| Mac mini M4 Pro 64GB | MONITOR | $1,200–1,800 | $1,500–2,000 | Wait for M5 release to drop M4 prices | $0 for now |
| RTX 4090 | NO EDGE | $1,600–2,000 | $1,600–2,000 | Market-priced by practitioners | $0 |
Note: M2 Ultra scores lower on GB/s per dollar but uniquely enables 192GB unified address space for models that cannot be quantized onto smaller VRAM without significant quality loss. The metric above does not capture the “model fits or does not fit” binary — where M2 Ultra is the only consumer option that runs 405B-class models at all.
| Scenario | RTX 3090 Outcome | M2 Ultra Outcome | Probability Est. |
|---|---|---|---|
| Base: thesis holds, moderate use | +60–100% value as inference node; resale at $500–700 | +80–140% value vs paid price; resale holds $1,600+ | 55% |
| API collapse: inference value drops 10x | Inference useless; resale as gaming GPU $400–600 | Inference useless; resale as consumer Mac $1,400–1,800 | 20% |
| Model scale: 48GB becomes minimum | 24GB worthless for inference; gaming resale $350–500 | 192GB remains viable; resale $1,500+ | 15% |
| Bull: local inference demand spikes | RTX 3090 revalued at $1,100–1,400; strong exit | M2 Ultra revalued at $3,500+; illiquid but high value | 10% |
These are the signals to track on a monthly basis to confirm or invalidate the thesis.
| Signal | What to Track | Source | Frequency |
|---|---|---|---|
| r/LocalLLaMA RTX 3090 sentiment | Is neutral/mixed sentiment holding, or is it shifting to bullish? If bullish: edge is shrinking. | Reddit r/LocalLLaMA | Monthly |
| eBay / Carousell price drift | RTX 3090 median price. If approaching $1,000+: buyers have repriced. Exit window is closing. | eBay sold listings; Carousell SG/HK | Monthly |
| Frontier model minimum VRAM | Does the leading open-weight model (Llama 4, etc.) require 48GB+ for practical use? If yes: 24GB cards are obsolete. | Hugging Face model cards; r/LocalLLaMA | On release |
| Commodity API blended price | DeepSeek V3 blended price per million tokens. If below $0.10/M: electricity parity pressure begins. | Artificial Analysis; provider pricing pages | Monthly |
| M2 Ultra used price | Has secondhand price risen above $2,500? That closes the arbitrage window. | eBay sold; Swappa; Xianyu | Monthly |
| NVIDIA 48GB+ consumer card announcement | Any announcement of a sub-$1,500 consumer card with 48GB+ VRAM invalidates the 24GB edge. | NVIDIA investor days; AnandTech; Tom’s Hardware | Quarterly |
| Sensor | Signal | Last Reading | Quality |
|---|---|---|---|
| r/LocalLLaMA RTX 3090 thread sentiment | Practitioner demand signal | +10 (neutral/mixed). Threads: “RTX 3090 in 2026,” “Talk me out of buying RTX 3090 just for local AI” | Active |
| r/LocalLLaMA Mac mini M4 Pro sentiment | Practitioner demand signal | +30 (bullish). Thread “Mac Mini looks compelling now” had 911 upvotes | Active |
| eBay sold listings (RTX 3090) | Real transaction price discovery | $600–900 range; median ~$720 | Periodic |
Priority sensor to build: a hardware price alert monitor for specific SKUs on Carousell SG/HK and Xianyu. Target implementation: a lightweight Python scraper that runs daily, extracts listings for RTX 3090, RTX 3090 Ti, A40 48GB, and M2 Ultra 192GB, and writes to a time-series store. Alert on: median price crossing the buy threshold upward (edge is closing) or downward (buying opportunity).
# Sensor spec: secondhand_compute_price_monitor
SKUs = [
"RTX 3090", # buy_threshold: $800
"RTX 3090 Ti", # buy_threshold: $900
"A40 48GB", # buy_threshold: $2,200
"Mac Studio M2 Ultra 192GB", # buy_threshold: $2,000
"V100 32GB", # buy_threshold: $800
]
Sources = ["carousell_sg", "carousell_hk", "xianyu", "ebay_sold"]
Alerts = {
"below_threshold": "BUY SIGNAL — price below no-lose threshold",
"above_exit": "EXIT SIGNAL — price above model price, edge closing",
"volume_spike": "SUPPLY SIGNAL — unusual listing volume (impending oversupply)"
}
The mispricing is real and currently actionable on two SKUs. RTX 3090 and M2 Ultra 192GB are priced by their respective markets as consumer electronics; they function as AI inference infrastructure. The structural exclusion of institutional buyers from the secondhand consumer market is durable and not arbitrageable by parties who would close the gap.
The conditions matter. This is not “buy any used GPU.” It is: (1) buy RTX 3090 or 3090 Ti below $800–$900 if you have genuine heavy inference consumption today; (2) buy M2 Ultra 192GB below $2,000 if you need to run 70B+ models at quality and have use for it; (3) do not buy RTX 4090 — the edge is fully priced; (4) the A40 is speculative but high-upside for buyers who can manage the passive cooler constraint.
Exit strategy is as important as entry. The asset is illiquid. The consumer electronics resale floor provides downside protection — use it. Monitor the three thesis invalidation signals monthly: API price collapse, model VRAM floor rising past 24GB, and Carousell price drift past buy threshold. When the edge closes, exit at consumer electronics price — you have lost nothing. When the thesis holds, you have run inference infrastructure for the cost of a gaming card.