Setup a private LLM server

Why do this?

The scenario: You want to play around with the fast-moving large language models, but don’t want to every random query / thought / personal health question to be logged / trained-on.

Requirements

You need to have a machine to run the LLMs. It needs to have lots of memory. Depending on the models, of course, but larger models are much better and useful. 16GB absolute minimum for a 7b-parameter model, and 27b models need 64G or more. I’m using an M4 Max Mac Studio with 128GB and that seems plenty.

The software is simple – ollama to manage and serve models, and the llm cli to run on your laptop. That way, the server does the lifting and your laptop doesn’t. They don’t, however, work out of the box, so I wanted to share some necessary steps.

Server setup with Ollama

On the server:

brew install ollama
brew services start ollama

Now we need some models. With lots of memory, I got the biggest models, but you should read the list to choose a few. This step pulls gigabytes, so beware on a metered connection.

ollama pull llama4:latest
ollama pull deepseek-r1-70b
ollama pull gemma3:27b

Now we need to tell Ollama to listen to all interfaces – this makes it accessible, since by default it binds only to localhost. There’s a GitHub issue about this if you want more details. We have to edit the plist:

open /opt/homebrew/opt/ollama/homebrew.mxcl.ollama.plist

As you can see, we have to add the OLLAMA_HOST environment variable, set to 0.0.0.0:11434

After that, save the plist and run

brew services restart ollama

To test the change, open the new page in a browser. (I’m using hostname axiom, FYI.)

open http://axiom:11434/

and you should see

That’s the server sorted – on to the client!

LLM client

The LLM cli by Simon Willison is just neat. It can do all sorts of things for you. We want to do the following:

  1. Install LLM
  2. Add our models using Ollama’s OpenAI compatible REST API to extra-openai-models.yaml
  3. Set the default model
  4. Set the sorta-not-needed API key (it’ll ask for it, give it the key ‘ollama’)
brew install llm
llm keys set ollama
llm keys path

Save that path – mine is

/Users/pfh/Library/Application Support/io.datasette.llm/keys.json

We need to create and populate
/Users/pfh/Library/Application Support/io.datasette.llm/extra-openai-models.yaml

Here’s a ChatGPT-generated shell script to query Ollama over ssh and populate the extra-openai-models.yaml

#!/bin/bash

pushd /Users/pfh/Library/Application\ Support/io.datasette.llm

# Configuration
HOST="axiom.phfactor.net"
PORT="11434"
OUTPUT="extra-openai-models.yaml"
API_BASE="http://${HOST}:${PORT}/v1"
API_KEY_NAME="ollama"

# Fetch models from Ollama
echo "Fetching models from Ollama at ${HOST}:${PORT}..."
MODEL_LIST=$(curl -s "http://${HOST}:${PORT}/api/tags" | jq -r '.models[].name')

# Begin YAML
echo "# Auto-generated OpenAI-compatible model config for LLM CLI" > "$OUTPUT"
echo "# Source: http://${HOST}:${PORT}/api/tags" >> "$OUTPUT"
echo "" >> "$OUTPUT"

# Write each model as a separate YAML block
while IFS= read -r MODEL; do
  MODEL_ID="${MODEL//:/-}"  # replace colon with dash
  cat <<EOF >> "$OUTPUT"
- model_id: $MODEL_ID
  model_name: $MODEL
  api_base: "$API_BASE"
  api_key_name: $API_KEY_NAME
  can_stream: true
  supports_schema: true
  vision: true

EOF
done <<< "$MODEL_LIST"

echo "Wrote $OUTPUT with $(echo "$MODEL_LIST" | wc -l) models."

Here’s my models file:

# Auto-generated OpenAI-compatible model config for LLM CLI
# Source: http://axiom.phfactor.net:11434/api/tags

- model_id: qwen3-latest
  model_name: qwen3:latest
  api_base: "http://axiom.phfactor.net:11434/v1"
  api_key_name: ollama
  can_stream: true
  supports_schema: true
  vision: true

- model_id: deepseek-r1-70b
  model_name: deepseek-r1:70b
  api_base: "http://axiom.phfactor.net:11434/v1"
  api_key_name: ollama
  can_stream: true
  supports_schema: true
  vision: true

- model_id: llama4-latest
  model_name: llama4:latest
  api_base: "http://axiom.phfactor.net:11434/v1"
  api_key_name: ollama
  can_stream: true
  supports_schema: true
  vision: true

- model_id: gemma3-27b
  model_name: gemma3:27b
  api_base: "http://axiom.phfactor.net:11434/v1"
  api_key_name: ollama
  can_stream: true
  supports_schema: true
  vision: true

Set the default model – I’m still experimenting, but for now its

llm models default llama4-latest

Important!

The api_base there is not the one in the docs – I had to use ChatGPT (sauce for the gander) to figure out that you have to add the /v1 suffix or the llm cli will return 404s.

What do you get?

From my laptop, I can run random queries and it works great – no limits, ask anything (Lots of stories about this!) and I feel oddly better about asking basic, stupid tech question that I “should” know or remember. For example:

  • What’s the makefile syntax for this?
  • Tell me how to glob the largest non-zero-file of episode.mp3 in a tree of files?
  • What’s the jq invocation to pull this?
llm 'nmap syntax to look for open tcp on a single host'
To use `nmap` to scan for open TCP ports on a single host, you can use the following basic syntax:

```bash
nmap -sT <hostname or IP address>
```

Here's a breakdown of the options used:
- `-sT` specifies a TCP connect scan, which is a type of scan that tries to establish a full TCP connection to the target ports. This is one of the most common scan types and works well for finding open TCP ports.

If you want to scan all 65,535 TCP ports (which can take a long time), you can use:

```bash
nmap -sT -p- <hostname or IP address>
```

However, for a more efficient and commonly used approach, scanning the top 1000 most common ports can be sufficient for many cases:

```bash
nmap -sT -p 1-1000 <hostname or IP address>
```

- `-p 1-1000` specifies the range of ports to scan. Adjust this range according to your needs.

### Additional Options

- **OS Detection:** `-O` (capital O, not zero) can help identify the operating system of the target, though this might not always be accurate or useful.
- **Version Detection:** `-sV` can provide information about the versions of services running on open ports.
- **Script Scanning:** `--script` can run NSE (Nmap Scripting Engine) scripts against the target.

### Full Command Example

For a more detailed scan:

```bash
nmap -sT -p 1-1000 -sV -O <hostname or IP address>
```

Make sure to replace `<hostname or IP address>` with the actual hostname or IP address of the target you're scanning.

### Permissions

Keep in mind that running `nmap` might require administrative or root privileges, especially for certain types of scans. If you run into issues, consider using `sudo` (on Unix-like systems) before your `nmap` command.

### Legal and Ethical Considerations

Always ensure you have permission to scan a target. Unauthorized scanning can be considered malicious and may lead to legal consequences.

Postscript

I opened an issue on the LLM cli, so their docs may be updated to document these steps.

Coding with ChatGPT

As I’ve written about before, LLMs are an amazing tool for programming. In addition to IDE plugins / autocomplete integrations (Github CoPilot, Qodo) I’m finding utility in a pattern of ‘running conversation in ChatGPT in dedicated app.’ I can ask anything there, snippets and one-offs and what-ifs. The full -o1 model is really good.

Anyway, this is the current state – side project (open source here) is an iOS app in SwiftUI that will automatically capture context using Bluetooth iBeacons with the goal of generating a log that can become contact engineer billing records. The idea:

  1. Enter region OFFICE at 8:00AM.
  2. Leave region OFFICE at 6:00PM.
  3. Enter region CAR at 2:15PM at GPS location LAT, LONG.

The GPS idea is for possibly trying to compute miles driven; may work or not. The regions will be marked using iBeacons. Cheap, supported by the operating system, easy.

I don’t really know SwiftUI, so I just asked ChatGPT for code and started hacking. It works and continues to work! LLMs are amazing for learning this way. I can ask random changes, explanations, fixing errors; anything! Here’s todays version of the app:

Current state showing real data.

Damn.

If you’ve not tried coding with an LLM, go try it. Treat it like a freshly graduated student – you can ask it anything, and it’ll know the most amazing ways to write code and solve problems. It’ll be wrong yet confident sometimes, but as a way of getting unstuck and having fun hacking, I’m having a ball.

Smokeping and Caddy 2: hard but worthwhile

Spent a few semi-pleasant hours today to get this working:

That’s Smokeping, proxied by Caddy2 on my love-it-to-bits Raspberry Pi4 web server.

Smokeping lets you track not just the yes-we-are-connected / no-we-are-not, but latency and packet loss and jitter. My network and ISP are generally solid, but this is an easy tool to have around. Once, that is, you have it installed.

I run Debian on my Pi, natch, and the wondrous Caddy to serve files and reverse proxy the various web apps. Not to mention automatic SSL certs from LetsEncrypt and the least verbose configuration possible. Smokeping, alas, uses the now-uncommon CGI interface, so gluing it all together took a while. Let me leave some notes for anyone else in this situation.

Basic install

apt install fcgiwrap
apt install smokeping
service fcgiwrap start

The /etc/smokeping/config.d directory has a bunch of edits you’ll need. In General:

cgiurl   = https://ping.phfactor.net/smokeping.cgi

Note that Caddy prefers CNAMEd virtual hosts, so I’m using ping.phfactor.net. You’ll need that in your DNS. Here’s the Caddyfile entry:

ping.phfactor.net {
	log {
		output file /var/log/caddy/ping.log
	}
	root * /usr/share/smokeping/www
	encode gzip
	file_server
	@cgi {
             path *.cgi
    	}

        reverse_proxy @cgi unix//var/run/fcgiwrap.socket {
    	    transport fastcgi {
                split .cgi
            	env SCRIPT_FILENAME /usr/share/smokeping/smokeping.cgi
            }
        }
    # Ensure CSS and JS files are served correctly
    @static {
        path /css/* /js/* /img/*
    }
    handle @static {
        file_server
    }

    # Try serving static files before falling back to CGI
    try_files {path} /{path}	

}

Kinda ugly. Might be some cleanup possible there. I also had to modify the HTML template file /etc/smokeping/basepage.html to remove the /smokeping/ prefix from the CSS and JS URLs:

   <link rel="stylesheet" type="text/css" href="/css/smokeping-print.css" media="print">
    <link rel="stylesheet" type="text/css" href="/css/smokeping-screen.css" media="screen">

...

<script src="/js/prototype/prototype.js" type="text/javascript"></script>
<script src="/js/scriptaculous/scriptaculous.js?load=builder,effects,dragdrop" type="text/javascript"></script>
<script src="/js/cropper/cropper.js" type="text/javascript"></script>
<script src="/js/smokeping.js" type="text/javascript"></script>

For now, I’m using the basic function of ICMP pings, but Smokeping supports more advanced tests such as SSH login and others.

Note that Safari seems a bit confused by Smokeping graphs, and caches old ones longer than it should. Chrome and Firefox do it right. Odd.

The results are pretty cool though.

Reddit on iOS minus ads

So a while ago, Reddit enshittified after taking PE money. Turned off the APIs, blocked third-party apps, etc. And the official app is a really shitty ad-laden experience. So. Do you have

  1. A Macintosh
  2. some code/build experience
  3. and iPhone or iPad
  4. the desire to read Reddit
  5. A $99/year Apple Developer account
  6. Stubborness?

The details would take ages to type out, thus numbers 2 and 6. Drop a comment if this is useful and I’ll write a followup; right now I’d guess I have maybe two-digit readership.

The source code that you want is called Winston, here on GitHub. Yes, like 1984. Clone it, load it into Xcode, and then modify the two bundle identifiers. I use the net.phfactor prefix since that’s my domain; be creative but they have to be unique to Apple.

I vaguely remember that you need to create a Reddit developer token which is also painful (See ) but only needs doing once. The results are well worth the hassle. I just pulled main and rebuilt today after my build expired. (The $99 developer device builds are only good for a year. Apple forces everything through their App Store and this as close as they allow. Yes, it sucks.)

And my local peeps

It’s good to be back.

Beware of censored LLMs

I’m a huge, huge fan of all things Simon Willison, but this latest post prompted me to write. Models trained by Alibaba, ByteDance and other Chinese companies have to adhere to Chinese censorship, and the companies have found a so-far secret solution to removing information from them. Qwen, for example, and the new DeepSeek-R1.

Simply ask this:

Tell me about Tiananmen Square. What happened there? Why is it famous? Why is it censored?

If the model is honest, it’ll tell you. If it’s censored, it may do this:

Or this

I haven’t explored the censorship much past that – I’d assume that there are censored topics, altered facts and perhaps added bias. Caveat emptor.

Wildfire intensity scale

Years ago, during the 2016 Fort McMurry wildfire, I read an article that I should have bookmarked that discussed wildfire in physics terms – watts per square meter. Above some threshold, you literally and actually cannot douse the flames. Today, with the LA fires raging, I went searching and found this PDF by Joe H Scott. Seems that the standard is from Byram, G. M. 1959. Combustion of forest fuels. In: Forest fire: Control and use, 2nd edition. New York, NY: McGraw-Hill: chapter 1, 61-89.

Here’s the key bit from the Scott paper:

So a basic wildfire is ticking along at 10kW per meter, and a rager might be 100 to 150 megawatts per linear meter.

Goddamn. No wonder you can’t extinguish them.

By way of comparison, a gallon of gasoline has around 33 kilowatt hours of energy. If I estimate right, a big fire is equivalent to a gallon of gas burned over a 15 minute interval. Not sure that helps my intuition, and I often get stoichiometry wrong anyway.

Music service playlist migration

It will not come as news to anyone streaming music via Spotify, Apple Music, Amazon, Tidal, etc – the playlist is the proprietary bit. The music is identical but your curated playlists are a barrier to moving.

Today, I saw a Spotify playlist in this Cool Tools post:

 I’m in love with this “Halloween” playlist because it isn’t cheesy songs like the Monster Mash and Ghostbusters, instead, it’s an adults’ Halloweenish soundtrack featuring great moody music from bands like M83, the Cure, the National and more. This plays nonstop at my house from Labor Day through the end of October.

Here is the playlist link – it’s “Halloween is a Dead Man’s Party

But I don’t use Spotify. Because of the subsidized hardware, we use Amazon to stream to a bunch of Echos connected to speakers.

The solution is a free web app called Tune My Music. It’s free, and Amazon lists it as an approved way to import playlists. It can go back and forth between a great number of services, but for me I just setup a new Spotify account (yay hide my email!), granted access to TMM, and then playlist access to Amazon music, and it copied it over. Only one track was missing; good enough.

So maybe a bookmark in case you want to move between services.

Status games

Mother Jones today has an excellent story on Arlie Russell Hochschild’s book “Stolen Pride.” This quote in particular:

“We live in both a material economy and a pride economy, and while we pay close attention to shifts in the material economy, we often neglect or underestimate the importance of the pride economy. Just as the fortunes of Appalachian Kentucky have risen and fallen with the fate of coal, so has its standing in the pride economy…https://www.motherjones.com/politics/2024/09/jd-vance-arlie-russell-hochschild-hillbilly-elegy-stolen-pride-excerpt/

So close! What’s she’s describing is status. Rank in the community, real or perceived. Will Storr wrote an excellent book about it, “The Status Game” that I highly recommend. (My local library had it.)

The MJ story is excellent and well worth your time. The Storr book is longer; this review might help you decide if you’d find it worthwhile.