Learn · Intermediate

How to run AI on free APIs (with 9router)

Almost everything you read on this site, the research behind the news, the fact-checking, the first drafts, runs on AI models that cost us nothing to call. Not a trial, not a teaser. Real, capable models, used heavily, every day, for a bill of zero dollars. The trick is not a secret coupon. It is knowing which companies give model access away, and using a small piece of free software to juggle them so smoothly that it feels like one paid service. That piece of software, in our case, is called 9router. This is how it works, and how you can set up the same thing.

First, what an "API" even is here

When you use a chatbot in your browser, you are a person clicking a website. An "API" is the version of that same model meant for programs instead of people: your code sends a question, the model sends an answer back, no website involved. That is how anyone builds an app on top of AI. Normally you pay per use, billed by the token, the chunks of text the model reads and writes. Read more about tokens in what is a context window. Those per-token charges are exactly what we avoid.

The surprising part: a lot of this is given away

Several companies hand out genuine free API access, each for their own strategic reason. These four are the backbone of our setup:

NVIDIA (sign up at build.nvidia.com) wants developers hooked on its chips, so it hosts a large catalog of popular open models, the Llama, Qwen, DeepSeek, Nemotron and GLM families among them, and lets you call them through a free API key. This is the workhorse of our stack.
Groq (console.groq.com) runs open models on its own custom hardware and offers a free tier mostly to show off how blisteringly fast it is. Same kinds of open models, answered quickly.
Google gives away access two ways: AI Studio (aistudio.google.com) has a free tier for its Gemini models, and a new Google Cloud account comes with a pile of free credits you can spend on those same Gemini models through Vertex, the cloud version, which for a small operation can last a very long time.
OpenRouter (openrouter.ai) is an aggregator that keeps a rotating set of open models tagged "free", which you call with the same single key. Handy as a catch-all backstop.

Exactly which models are free, and how much you get, shifts over time, so check each site's current free tier rather than trusting a number you read somewhere. None of this is charity, and none of it is unlimited, which is the catch we will get to. But stacked together, it is more than enough to run a serious workload. These are mostly open-weight models, the same family of freely-shared models we cover elsewhere, which is exactly why so many providers can offer them.

The problem with free: rate limits, and a different door for each

Free access always comes with a leash: a rate limit, a cap on how much you can ask for in a given window before the provider says "too many requests, slow down." Lean on any single free source and you hit that wall constantly. The obvious fix is to spread your work across several providers, but now you have a new headache: each one speaks a slightly different dialect, wants its own login key, and lives at its own address. Wiring your app to all of them by hand is miserable.

This is the exact job an API router solves.

What a router does, in plain terms

Think of a power strip with one plug going into the wall and many sockets on the front. Your devices all plug into the strip and do not care which outlet behind the wall is feeding it. A router for AI works the same way. It gives you a single address to send every request to. Behind that single address, it holds all your different provider logins and decides, request by request, which one to actually use.

The real magic is what happens when a provider taps out. You arrange your providers in a priority order, a fallback chain. The router tries the first. If that one is rate-limited or down, it quietly slides to the second, then the third, without your app ever noticing. You write your program once, against one address, and the router absorbs all the messiness of the free-tier world behind it.

How we actually run it

We use 9router, a small, free, open-source router (9router.com) you run on your own machine. Ours has a handful of free providers loaded in and ordered by preference: a couple of free NVIDIA developer accounts first, then Groq, then Google's credit-backed models, then OpenRouter's free models, with a model running locally on our own computer as the final safety net. When the busy NVIDIA tier starts throwing "slow down" errors during a big research run, requests spill automatically to the next provider in line. The work just keeps flowing. From the point of view of our scripts, there is one endpoint, living at a local web address on this machine, and it never sends a bill.

That is the whole secret to "$0 research." Not one magic free model, but several modest free tiers, chained so that the group covers what no single one could.

Do it yourself

You need a computer with Node.js installed (the runtime a lot of developer tools use). Then:

1. Install and start the router. In a terminal:

``` npm install -g 9router 9router ```

It starts a small server on your machine and opens a control panel in your browser automatically (a local-only address, nothing exposed to the internet). The endpoint your apps will call sits at the same address, by default http://localhost:20128/v1.

2. Get a free key from each provider. Sign up on each developer site from the list above and copy the API key it hands you: NVIDIA at build.nvidia.com, Groq at console.groq.com, Google at aistudio.google.com (free Gemini tier, or Vertex for the credits), and OpenRouter at openrouter.ai. Every one is a free signup, and you can start with just one and add more later.

3. Add them to the router. In the 9router control panel, add a "provider connection" for each one, paste its key, and set a priority number. Lower priority numbers get tried first, so put the provider with the most generous limits at the top and your last-resort option at the bottom.

4. Build a fallback chain. Group your providers into an ordered list so that if the first is busy, the router moves to the next automatically. This is the part that turns several twitchy free tiers into one dependable service.

5. Point your app at the router. Almost every AI tool and code library can be told to use a custom "base URL" instead of a paid company's. Set that base URL to your router's address (http://localhost:20128/v1), drop in your fallback chain as the model name, and you are done. Because the router speaks the same common dialect the big providers use, most existing code needs only that one line changed.

The honest catches

This is real, but it is not magic, and pretending otherwise will get you burned:

The limits are real and they move. Free tiers throttle you, and the exact ceiling often is not even published, it shifts with how busy the provider is. A fallback chain softens this; it does not abolish it. For heavy, time-sensitive work you will still feel the squeeze, which is why we keep a local model as the last link.
Free tiers change without warning. A provider can tighten a limit or end a free program any week. Treat this as a clever way to get going cheaply, not as permanent free infrastructure to bet a business on.
Read each provider's terms. Free developer tiers come with rules about what you may do with them. Honor them. This article is about the legitimately-free developer tiers above, not about laundering a personal subscription into an API.
Your keys live on your machine. A self-hosted router keeps your provider keys in local storage on your own computer. That is good for privacy, but it also means securing that machine is on you.

What to take away

Capable AI is far cheaper to use than the monthly-subscription framing suggests, especially if your needs are bursty rather than constant. The companies are competing hard enough that several of them give real model access away to win your loyalty. A small router like 9router is the piece that makes those scattered free tiers usable together: one address out front, many free providers behind it, automatic fallback when any one of them taps out. It is the same idea as running your own little switchboard, and it is what lets a small operation, like this one, do a professional amount of AI work without a professional-sized bill. If you want to understand the cost difference it is exploiting, our lesson on training versus inference explains why using a finished model is the cheap part in the first place.