Skip to main content

Deploying CEPH in the homelab


I recently rebuilt my Ceph lab to more closely mirror a real production deployment, rather than the usual - it works but don’t look too closely lab setups.

The goals were simple but non-negotiable:

  • 3 MONs (odd quorum)
  • 2 MGRs (HA control plane)
  • Host-level fault domain
  • Replication size = 3
  • RGW (S3) only — no CephFS, no RBD
  • Clean DNS (no /etc/hosts hacks)

This post walks through the exact process I used to deploy a clean, repeatable Ceph RGW cluster using cephadm on Ubuntu, with explicit placement control and zero surprises.

🧠 Cluster Design – Nodes & IPs

Monitor / Manager Nodes (Control Plane)

  • ceph-mon01 — MON + MGR — 172.16.1.81
  • ceph-mon02 — MON + MGR — 172.16.1.82
  • ceph-mon03 — MON + MGR — 172.16.1.83

RGW (S3 Gateway)

  • ceph-rgw01 — RGW Gateway — 172.16.1.86

OSD Storage Nodes

  • ceph-osd01 — OSD Node (6 × 80 GB disks) — 172.16.1.91
  • ceph-osd02 — OSD Node (6 × 80 GB disks) — 172.16.1.92
  • ceph-osd03 — OSD Node (6 × 80 GB disks) — 172.16.1.93

πŸ“‹ Cluster Requirements

  • Ceph deployed via cephadm
  • Object storage only (RGW / S3)
  • Replication size = 3
  • Root SSH enabled (lab convenience)
  • DNS must exist (forward + reverse) for all hosts
  • /etc/hosts is not allowed for Ceph nodes


πŸ› ️ Prepare All Nodes

Set FQDN Hostnames (run the correct command on each node)

# ceph-mon01 hostnamectl set-hostname ceph-mon01.lilbits.xyz # ceph-mon02 hostnamectl set-hostname ceph-mon02.lilbits.xyz # ceph-mon03 hostnamectl set-hostname ceph-mon03.lilbits.xyz # ceph-rgw01 hostnamectl set-hostname ceph-rgw01.lilbits.xyz # ceph-osd01 hostnamectl set-hostname ceph-osd01.lilbits.xyz # ceph-osd02 hostnamectl set-hostname ceph-osd02.lilbits.xyz # ceph-osd03 hostnamectl set-hostname ceph-osd03.lilbits.xyz


πŸ” DNS Sanity Check (run on every node)

hostname -f getent hosts $(hostname -f) getent hosts 172.16.1.81

If this doesn’t work, stop here — Ceph will absolutely punish bad DNS later.


πŸ“¦ Install Required Packages

apt update && apt upgrade -y apt install -y curl vim lvm2 chrony openssh-server docker.io systemctl enable --now docker

πŸ” Enable Root SSH (lab choice)

passwd root nano /etc/ssh/sshd_config

Ensure:

PermitRootLogin yes PasswordAuthentication yes
systemctl restart sshd

πŸš€ Bootstrap Ceph (on ceph-mon01 only)

Install cephadm

curl --silent --remote-name https://raw.githubusercontent.com/ceph/ceph/quincy/src/cephadm/cephadm chmod +x cephadm mv cephadm /usr/local/bin/ cephadm install

Bootstrap the cluster

cephadm bootstrap \ --mon-ip 172.16.1.81 \ --mon-id ceph-mon01.lilbits.xyz \ --allow-fqdn-hostname \ --skip-prepare-host

Note (Ceph Quincy):
Additional MONs added later may appear with short IDs (e.g. mon.ceph-mon02).This is expected and does not affect quorum or DNS usage.

Temporary placement safety (important!)

ceph orch apply mon --placement="1 ceph-mon01.lilbits.xyz" ceph orch apply mgr --placement="1 ceph-mon01.lilbits.xyz"

This prevents Ceph from temporarily placing MON/MGR daemons on random hosts before we apply labels.


πŸ”‘ Fix SSH Key Distribution

Cephadm generates its SSH key here:

/etc/ceph/ceph.pub

Copy it to all nodes:

ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-rgw01.lilbits.xyz ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-osd01.lilbits.xyz ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-osd02.lilbits.xyz ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-osd03.lilbits.xyz ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-mon02.lilbits.xyz ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-mon03.lilbits.xyz

➕ Add Hosts to the Cluster

From the cephadm shell on mon01:

ceph orch host add ceph-rgw01.lilbits.xyz ceph orch host add ceph-osd01.lilbits.xyz ceph orch host add ceph-osd02.lilbits.xyz ceph orch host add ceph-osd03.lilbits.xyz ceph orch host add ceph-mon02.lilbits.xyz ceph orch host add ceph-mon03.lilbits.xyz

🏷️ Label Hosts (this is where control happens)

# MON + MGR ceph orch host label add ceph-mon01.lilbits.xyz mon ceph orch host label add ceph-mon01.lilbits.xyz mgr ceph orch host label add ceph-mon02.lilbits.xyz mon ceph orch host label add ceph-mon02.lilbits.xyz mgr ceph orch host label add ceph-mon03.lilbits.xyz mon ceph orch host label add ceph-mon03.lilbits.xyz mgr # RGW ceph orch host label add ceph-rgw01.lilbits.xyz rgw # OSDs ceph orch host label add ceph-osd01.lilbits.xyz osd ceph orch host label add ceph-osd02.lilbits.xyz osd ceph orch host label add ceph-osd03.lilbits.xyz osd

🧭 Deploy Additional MONs

ceph orch daemon add mon ceph-mon02.lilbits.xyz ceph orch daemon add mon ceph-mon03.lilbits.xyz

πŸ”’ Lock MON and MGR Placement (don’t skip this)

ceph orch apply mon --placement="label:mon" ceph orch apply mgr --placement="2 label:mgr"

This guarantees:

  • Exactly 3 MONs
  • Exactly 2 MGRs
  • Zero daemon drift

Verify:

ceph mgr stat

πŸ’½ Deploy OSDs

ceph orch device ls ceph orch apply osd --all-available-devices

You should end up with 18 OSDs total (6 per node).


☁️ Deploy RGW (S3)

ceph orch apply rgw 1 --placement="1 ceph-rgw01.lilbits.xyz" ceph orch ps --daemon-type rgw

🧱 Set Replication Size = 3

RGW system pools (after RGW deploy)

ceph osd pool set .rgw.root size 3 ceph osd pool set default.rgw.control size 3 ceph osd pool set default.rgw.meta size 3 ceph osd pool set default.rgw.log size 3

Defaults for future bucket pools

ceph config set global osd_pool_default_size 3 ceph config set global osd_pool_default_min_size 2

✅ Final Result

At this point you have:

  • 3-node MON quorum
  • HA MGR (1 active, 1 standby)
  • Host-level CRUSH failure domain
  • All RGW pools replicated ×3
  • Future buckets inherit correct durability
  • A Ceph lab that behaves like production

If you’ve ever wondered why Ceph gets a bad reputation — it’s usually because people skip steps like DNS, placement locking, or CRUSH validation. Don’t do that. πŸ˜‰

Happy clustering πŸ™

Comments

Popular posts from this blog

Removing Multiple Datastores From Single ESXI Host Via PowerCLI

This post covers removing a multiple datastores from a single host at a time in vCenter with the use of PowerCLI. In the case where you have multiple hosts which have MANY MANY Datastores attached, its important to make sure you fully unmount and detach the DS from the host before removing storage connections.  Below is the code to make this happen $datastores = 'DS1','DS2'  $startTime = Get-Date $esxName = 'vmh.local' foreach($datastoreName in $datastores){ $datastoreName  $esx = Get-VMHost -Name $esxName $ds = Get-Datastore -Name $datastoreName $canonicalName = $ds.ExtensionData.Info.Vmfs.Extent[0].DiskName $storSys = Get-View $esx.Extensiondata.ConfigManager.StorageSystem $device = $storsys.StorageDeviceInfo.ScsiLun | where {$_.CanonicalName -eq $canonicalName} if($device.OperationalState[0] -eq 'ok'){     $StorSys.UnmountVmfsVolume($ds.ExtensionData.Info.Vmfs.Uuid) } $storSys.DetachScsiLun($device.Uuid) } $endTime = Get-Date $executionTime = $endTime...

HPE DL Series Host TPM Attestation Alarm Remediation

  Recently in my lab, i ran into an issue with the Host TPM Attestation Alarm being set. This was a little annoying that out of the box there are some configuration items that are not done by standard, and this guide will cover the specific BIOS / RBSU Configs that need to be made to clear this up.  First off, we need to boot into the BIOS / System Config / RBSU, so unfortunately you need to reboot your host - none of these changes can be made through the ILO "I reboot now - Good luck everybody else"  Next up, we need to navigate to Server Security and Secure Boot Settings  Next up, Select "Attempt Secure Boot" and accept the warning regarding the required reboot.  Navigate back to the main "Server Security" Menu and Select Trusted Platform Module Options  Ensure you have the below Config:  Current TPM Type: TPM 2.0  Current TPM 2.0 Active PCRs: SHA256 Only  TPM 2.0 Operation: No Action UNLESS your current TPM Type is not 2.0 - change to TPM ...

Wazuh SIEM Deployment: Deploying Your First Agent

  Now that you have deployed your Wazuh server, we can get onto deploying agents to our endpoints (Windows servers, workstations, Linux boxes etc).  For housekeeping, we can create groups within the Wazuh management console, these will be used to logically group our devices.  Begin by navigating to the Wazuh drop down menu, Select Management and then Groups  Select "Add Group" and enter your desired group name(s) - i've created a number of groups for my infrastructure.  Cool! Now the groups are done, we can begin deploying our agent. I'm starting with my windows workstation that sits in the bottom of my rack.  Select the Wazuh Menu again and select "agents" as this is our first agent deployment, we are taken directly to the add agent screen.  Fill in the appropriate details.  Once we have filled in all the details, you'll be presented with a PS script to run, make sure you launch PS as an administrator to install the agent.  Copy the PS scrip...