zk0 Node Operators Guide

Welcome to the zk0 Node Operators Guide! This document provides everything you need to know to participate in the zk0 federated learning network as a node operator.

What is zk0?

Installation Guide Architecture Overview Running Simulations

zk0 is a federated learning platform for robotics AI, enabling privacy-preserving training of SmolVLA models across distributed clients using real-world SO-100/SO-101 datasets. Node operators contribute their private robotics datasets while maintaining full data privacy.

Getting Started

1. Apply to Become a Node Operator

To join the zk0 network:

  1. Review Requirements: Ensure you have:
    • A private robotics dataset (SO-100/SO-101 compatible)
    • GPU-enabled machine (recommended for training)
    • Stable internet connection
    • Basic familiarity with Conda and tmux
  2. Submit Application: Create a new issue using our Node Operator Application Template

  3. Wait for Approval: Our team will review your application and contact you via Discord

2. Install zk0bot CLI

Once approved, install the zk0bot CLI tool:

# One-line installer
curl -fsSL https://raw.githubusercontent.com/ivelin/zk0/dev/get-zk0bot.sh | bash

This will:

3. Configure Your Environment

Set up required environment variables in .env (auto-sourced by zk0bot.sh):

# .env example (create in ~/zk0)
HF_TOKEN=your_huggingface_token_here
WANDB_API_KEY=your_wandb_key_here  # optional, server-side only

Note: zk0bot.sh automatically sources .env after conda activation, propagating HF_TOKEN/WANDB_API_KEY to tmux Flower subprocesses (SuperLink/SuperNode). No manual export needed.

Full Production Session Example (Local Network)

Server Machine:

curl -fsSL https://raw.githubusercontent.com/ivelin/zk0/main/website/get-zk0bot.sh | bash
cd ~/zk0
zk0bot server start  # Auto-activates zk0 env; SuperLink ready

Client Machines (same LAN, add to ~/.bashrc: export ZK0_SERVER_IP=server_ip):

curl -fsSL https://raw.githubusercontent.com/ivelin/zk0/main/website/get-zk0bot.sh | bash
cd ~/zk0
zk0bot client start shaunkirby/record-test  # Auto-activates zk0 env; or your private dataset
zk0bot client start ethanCSL/direction_test

On Server (submit run):

zk0bot run --rounds 20 --stream  # Full FL session, stateless; auto-zk0 env

Remote Clients: Set ZK0_SERVER_IP=public_server_ip (insecure=true for dev; TLS for prod).

Note: WandB logging is handled server-side only. Client training does not require WandB credentials.

4. Start Your Client

Light Test Production Run (Recommended):

  1. Server: zk0bot server start
  2. Clients: zk0bot client start shaunkirby/record-test
  3. Submit run: zk0bot run –rounds 3 –stream

Examples: zk0bot client start shaunkirby/record-test zk0bot client start ethanCSL/direction_test

Production Run (Stateless):

# Standard (pyproject.toml defaults) - runs all server rounds
zk0bot client start yourusername/your-private-dataset
zk0bot client start local:/path/to/your/dataset

Your client will:

Server Operations (For Server Operators)

If you’re running a zk0 server:

# Start the server
zk0bot server start

# Check status
zk0bot status

# View logs
zk0bot server log

# Stop the server
zk0bot server stop

Monitoring and Troubleshooting

Check Status

zk0bot status

View Logs

# Server logs
zk0bot server log

# Client logs
zk0bot client log

Common Issues

tmux not found: sudo apt install tmux (Linux) or brew install tmux (macOS) Conda zk0 not active: conda activate zk0 Dataset not found: Verify dataset path/URL and credentials Connection failed: Check internet connection and server availability Installer fails: Check GitHub status, ensure curl available, or git clone https://github.com/ivelin/zk0.git ~/zk0; cd ~/zk0; ./get-zk0bot.sh

Dataset Requirements

Supported Formats

Quality Guidelines

Privacy Considerations

Dynamic Client Joining (Stateless)

Server Behavior

Client Lifecycle (Stateless)

Flower Deployment Engine

Federation Flow

sequenceDiagram participant Admin as Admin/Submission\n(zk0bot.sh run) participant SuperLink as SuperLink\n(flower-superlink) participant SuperNode1 as SuperNode 1\n(flower-supernode,\ndataset-uri=uri1) participant SuperNode2 as SuperNode 2\n(flower-supernode,\ndataset-uri=uri2) participant ServerApp as ServerApp\n(SuperExec process\non SuperLink host) participant ClientApp1 as ClientApp 1\n(SuperExec process\non SuperNode 1) participant ClientApp2 as ClientApp 2\n(SuperExec process\non SuperNode 2) Note over SuperLink,SuperNode2: Persistent Infrastructure (started first) Admin->>+SuperLink: Start SuperLink\n(zk0bot.sh server start) Note right of SuperLink: Listens on gRPC Fleet API\n(ports 9091-9093) Admin->>+SuperNode1: Start SuperNode 1\n(zk0bot.sh client start <dataset-uri1>) Note right of SuperNode1: e.g., dataset-uri1 = "shaunkirby/record-test"\nor "local:/data/client1_episodes" SuperNode1->>+SuperLink: Register via gRPC\n(Fleet API handshake) Note right of SuperNode1: Passes node-config\n(dataset-uri=uri1 → unique/private dataset) Admin->>+SuperNode2: Start SuperNode 2\n(zk0bot.sh client start <dataset-uri2>) Note right of SuperNode2: e.g., dataset-uri2 = "ethanCSL/direction_test"\nor private HF repo / local path SuperNode2->>+SuperLink: Register via gRPC\n(Fleet API handshake) Note right of SuperNode2: Passes node-config\n(dataset-uri=uri2 → unique/private dataset) Note over SuperLink,SuperNode2: SuperNodes now visible/registered in SuperLink logs Admin->>+SuperLink: Submit Run\n(zk0bot.sh run → flwr run) Note right of Admin: Uploads Flower App Bundle (FAB)\ncontaining ServerApp + ClientApp code SuperLink->>ServerApp: Spawn SuperExec process\nfor ServerApp execution Note right of ServerApp: ServerApp starts (strategy, rounds, etc.) SuperLink->>SuperNode1: Instruct to execute ClientApp\n(via registered Fleet API, sends FAB + config) SuperNode1->>ClientApp1: Spawn SuperExec process\nfor ClientApp Note over ClientApp1: ClientApp loads private/unique dataset\n(from injected node-config dataset-uri=uri1)\ne.g., HF dataset download or local path SuperLink->>SuperNode2: Instruct to execute ClientApp\n(via registered Fleet API, sends FAB + config) SuperNode2->>ClientApp2: Spawn SuperExec process\nfor ClientApp Note over ClientApp2: ClientApp loads private/unique dataset\n(from injected node-config dataset-uri=uri2)\ne.g., different HF repo or local episodes Note over ServerApp,ClientApp2: Federation begins (gRPC message passing via SuperLink/SuperNodes) loop For each federation round (e.g., Fit) ServerApp->>SuperLink: Send FitIns (parameters)\nto selected SuperNodes SuperLink->>SuperNode1: Forward FitIns (gRPC) SuperNode1->>ClientApp1: Forward to local SuperExec ClientApp1->>ClientApp1: Local training\non private unique dataset (from uri1) ClientApp1->>SuperNode1: Return FitRes (updated parameters) SuperNode1->>SuperLink: Forward FitRes SuperLink->>ServerApp: Deliver FitRes SuperLink->>SuperNode2: Forward FitIns (gRPC) SuperNode2->>ClientApp2: Forward to local SuperExec ClientApp2->>ClientApp2: Local training\non private unique dataset (from uri2) ClientApp2->>SuperNode2: Return FitRes SuperNode2->>SuperLink: Forward FitRes SuperLink->>ServerApp: Deliver FitRes ServerApp->>ServerApp: Aggregate updates\n(e.g., FedAvg) end Note over ServerApp,ClientApp2: Similar flow for Evaluate rounds\n(Server sends EvaluateIns, clients evaluate locally on their private datasets from uri1/uri2) Note over SuperLink, SuperNode2: Run completes → SuperExec processes terminate. SuperLink & SuperNodes remain running for next Run

Updated Explanation of the Sequence (Dataset URI Flow)

The core architecture uses Flower’s Deployment Engine, with client data privacy via positional <dataset-uri> in zk0bot.sh client start <dataset-uri>.

CLI Change

Persistent Infrastructure First

Dynamic App Execution on Run Submission

Message Passing During Federation

Community and Support

Discord

Join our Discord community for support and updates: zk0 Discord

GitHub

Contact

Technical Details

System Requirements

Security

Performance

Advanced Configuration

Advanced tmux/Conda Configuration

zk0bot uses native Flower CLI + tmux for persistence. For custom setups:

Environment Variables

Contributing

We welcome contributions to improve the zk0 platform:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request
  4. Join our Discord for discussion

License

zk0 is open-source software licensed under the Apache 2.0 License.


Last updated: 2025-12-17