Back to list
Development Update — June 13
A day that follows the plain-HTTP retirement to its logical conclusion: if deployment services live on the overlay, you need a clean way to see them from a browser without a clearnet hop — and that way is the resolving SOCKS5 proxy. The headline work makes those .dmsg / .skynet proxies smarter: a visor asked for its own key now short-circuits in-process instead of dialing a wasteful round-trip back to itself, and a configurable alias lets skywire.dmsg mean “the local visor’s landing page.” Around it: the CLI finally drops its dead plain-HTTP fallback (the resolver is the supported replacement), a discovery-side timeout-and-hang fix on /uptimes?v=v3, a self-healing --direct route for control-plane forwards, and the real fix for a dmsg-discovery CPU storm that had been pinning the service for hours.
Skywire: Self-Loopback and PK Aliases for the Resolving Proxies
The .dmsg / .skynet resolving proxies let a browser or curl reach a peer by key — http://<pk>.dmsg — over the overlay. But pointing one at the local visor’s own key did something silly: it dialed out over dmsg/skynet and back to the same visor, a wasteful round-trip that dmsg doesn’t even loopback, and a frequent source of dmsg-202 errors.
3102 feat(resolver): self-loopback + PK alias for the dmsg/skynet resolving proxies fixes both ends of the ergonomics. Self-loopback (on by default) serves a request for this visor’s own PK straight from the local service registry in-process over net.Pipe — no transport hop at all — with "self_loopback": false available to exercise the full self-transport path as a genuine reachability test. Aliases let a friendly label resolve to a PK, so skywire.skynet / skywire.dmsg loads the local visor’s port-80 landing page through the resolver without hardcoding the key. The in-process conn is wrapped in tcpAddrConn so go-socks5’s unchecked *net.TCPAddr assertion — which builds the BND reply after the Dial callback returns — doesn’t panic on net.Pipe’s address type.
3103 refactor(resolver): single alias string for the local PK, not a label→PK map trims the surface the day after. The aliases map shipped in #3102 could point any label at any PK, but the only intended use was giving this visor’s own key a friendly hostname — so the general map was more machinery than the feature needed. It collapses to a single alias string per resolver (default "skywire"), dropping the “self”/PK-hex parsing and its error path while leaving ParseResolverHost’s signature and tests unchanged. The namespace for a real DNS-over-skywire scheme stays unclaimed for later.
Skywire: Drop the CLI’s Plain-HTTP Fallback
3100 fix(cli): drop plain-HTTP fallback from deployment fetches + document the resolving proxy removes the last clearnet escape hatch from the CLI’s service-fetch chain. The plain-HTTP step in FetchServiceURL was already skipped for every deployment service (all are dmsg-mapped) and had no live HTTP-only caller — dead code kept “just in case,” and the case it fired in was an erroneous one. The chain is now CXO → visor-RPC-DmsgHTTP → direct-DMSG only, and the now-meaningless --no-http flag is gone. The one advantage the fallback offered — eyeballing a service in a browser without setup — is better served by the resolving proxy, so the PR also adds an operator guide (docs/guides/resolving-proxy.md) covering enable/status, addressing, curl and browser usage, and TLS, plus expanded skysocks and VPN client guides.
Skywire: The /uptimes?v=v3 Hang
3099 fix(discovery): GET /uptimes?v=v3 hang on dmsg-discovery + service-discovery chases down why cli ut mdisc graph (and any v3 uptimes fetch) returned EOF over dmsg and hung past 60s on clearnet, while v2 returned fine. The v3 timeline handler computed a per-visor, per-day timeline by issuing 288 individual GETBIT commands per visor-day — for the live network (~1375 heartbeat visors over 7 days) that’s roughly 2.7 million Redis ops across ~9.6k sequential pipeline round-trips, running the handler past 60s. Over dmsg the server’s 3-second write timeout fired mid-handler and closed the stream, so the CLI read EOF; on clearnet it just hung. The fix reads each day’s bitmap with one MGET (extracting the 288 bits in Go, MSB-first to match GETBIT) instead of 288 commands, pipelines the seven per-day HGets into one round-trip, and raises the shared dmsghttp server write timeout from 3s to 30s — too short a window to serve heavy aggregation over dmsg, and the change unblocks other large dmsg-served CLI fetches that were intermittently EOF’ing. service-discovery had the identical blowup and gets the identical fix; transport-discovery already did the single-GET form. Takes effect on redeploy, and is a prerequisite for #3100’s fallback removal.
Skywire: A Self-Healing --direct Route for Control-Plane Forwards
3098 feat(skynet): --direct route — self-healing direct transport for control-plane forwards addresses a flaky-by-design behavior in skynet-client forwards. A forward (e.g. the rsn-pprof resolving-proxy bridge) silently rides whatever route the route-finder returns, and the finder only hands back a 1-hop route when an open direct transport already exists and its edge has propagated to the transport-discovery’s graph — otherwise the hop floor stays at 2 and the dial goes multihop. When the peer restarts and its direct transport drops, nothing recreates it, so the forward degrades to a flaky multihop route — observed live as ~50% request loss while the resolver itself was 100% reliable directly on the host. --direct bypasses the route-finder for a 1-hop dial and adds EnsureDirectTransport, which makes every dial create the direct transport (stcpr→sudph→dmsg) if none is open — so the forward self-heals when the transport drops instead of silently falling back to multihop.
Skywire: The dmsg-discovery CPU Storm, at the Root
3097 fix(dmsghttp): stop dmsg-discovery CPU storm — server idle timeout was defeating the client stream pool is the kind of fix that only lands after the wrong theory is ruled out. The dmsg-discovery had been pinned at 100% CPU for hours, and the obvious read was a request-rate storm. But a CPU profile of the pinned service showed ~40% in the noise-KK handshake responder (secp256k1 point multiplication) — the cost was a fresh handshake on essentially every request, not the request count. The client already pools idle dmsg streams (90s) so periodic discovery calls reuse a stream instead of re-handshaking — but the server set its HTTP idle timeout to 30s, below the 60s entry-refresh cadence. So the server tore the idle stream down before the client’s refresh could reuse it; the pool was silently defeated and every visor re-handshook the discovery service on every refresh, fleet-wide. Raising the server idle timeout to 115s (above the 60s refresh, just under the 120s dmsg read-deadline) lets a pooled stream span multiple refreshes, taking effect the moment the service redeploys with no fleet update. A per-PK request rate limiter — keyed by the noise-authenticated dmsg PK so it can’t be evaded by reconnecting or a spoofed X-Real-IP — is retained as complementary defense-in-depth.