Creating a Self-Documenting Homelab

I had been searching online for how others documented their cybersecurity homelabs. However I did not find anything of interest that made. Further, I am certain that most people’s least favorite task (or least considered) is writing documentation. That’s why we are cursed with so many libraries without basic documentation 🙂 In the following, I am going to show you I lay out the documentation for my homelab on github. I had the following goals for my documentation:

Make adding to the documentation simple and fast utilizing automation
Provide enough documentation so anyone else could recreate this environment if needed
Allow myself to quickly reference previous configuration of VMs, or the configuration of my entire homelab!

Starting Point

Whenever looking back on a project that you failed to document properly. It is easy to get overwhelmed on where to start. What has always helped me, is understanding what I currently had, and then figuring out what is the most important to document first.

Now, I at least had a starting point on some diagrams that I could use thanks to my previous blog posts. In a way as well these blog posts serve as a form of documentation, however their intention is to entertain and inform. Documentation should always be to inform, and maybe to entertain. If you want a laugh, watch this video showing the source code comments from the Simpson’s Hit and Run game.

I also had my notes that I kept as I was building out the homelab. They have some comments about why I made the decisions I did that will be valuable as I build this out.

Scripting time

Foremost, the heart and soul of my homelab: my virtual machines. I decided to document my virtual machines by generating .yaml files.

A goal of this homelab was automation, so I turned to an API for proxmox called Proxmoxer, that I can use to get all the details about my vms or frankly anything else that I want to get from my server. This seems like a cool API to push data into dashboards, cause do you even have a homelab if you don’t have random dashboards showing your uptime? For now however, we are all about documentation!

I created a script which you can find at this github link under scripts/documentation, which outputs a YAML file like this:

vms:
- vmid: 201
  name: pfFirewall
  memory_mb: '2048'
  cores: 4
  sockets: 1
  network_interfaces:
  - interface: net0
    bridge: vmbr0
    firewall: '1'
  - interface: net1
    bridge: vmbr2
    firewall: '1'
  disks:
  - name: scsihw
  - name: scsi0
    storage: LVM-SSD
    size: 32G
  - name: ide2
    storage: local
    size: 966536K
  node: pve
- vmid: 202
  name: kaliMachine
  memory_mb: '8192'
  cores: 4
  sockets: 1
  network_interfaces: 
  
  .....

In reality though, I am doubtful I am going to find a time where I want to look through a list of all my virtual machines. However it is still good to have for documentation purposes. I think what would be better is to have all of my different VLANs divided under a directory. That way say if I wanted to view the configuration of my kali-linux machine, I just have to go to a folder dedicated to its VLAN. For example if this is my current structure:

I would have a structure like this:

services/ 
	vlan-1-management/ 
		kali-linux/ 
			config.yaml 
			...
			...
	vlan-10-infrastructure/ 
	vlan-20-domain/

And inside of each of these folders I could have notes related to each of those machines, scripts and otherwise! I could also store these in larger “catch-all folders” like my scripts/ folder mentioned previously. I like this approach as it provides structure to navigate information that I need specifically super quickly. While also allowing me to do more general fuzzy searches with the larger files. Thankfully there is not much modification that needs to be done to my original script, just some noodling with the directory writing. Once that is complete, I have a structure that looks like this:

Perfect! Now it is time to think of other things that should be automated. One thing that I really want that I want to track in a file, is how much memory I am allocating to each of my machines. In Proxmox you are able to allocate memory to a machine, however that does not mean that this memory is automatically reserved for that machine. Any memory that is not actively in use whether disk or RAM can be used by other machines. In this way, you could “overallocate” the actual memory that you have running on your server. So I want to be able to track the total amount of memory that I have allocated to specific disks, just so I can track that usage and modify the storage of my files if needed.

I first created a simple file that just shows the storage of each and their current usage. Which looks like this:

Then I created another that shows the amount of storage that has been allocated as a whole, with a break down per vm:


.... 

LVM-SSD:
    total_gb: 912.76
    used_gb: 152.71
    available_gb: 760.06
    allocated_to_vms_gb: 752
    vms:
    - vm_name: pfFirewall
      vmid: 201
      allocated_gb: 32
    - vm_name: kaliMachine
      vmid: 202
      allocated_gb: 80
    - vm_name: docker-server
      vmid: 203
      allocated_gb: 160
    - vm_name: nessusMachine
      vmid: 206
      allocated_gb: 40
  
......

Lastly I wanted to do the same for my RAM, my server is currently running 32 GB of RAM, which is honestly not that much considering security onion eats so much of it (I am eagerly awaiting for a sale at microcenter)

While its not possible to get the usage of the RAM accurately through this scripts without the machines being active, I can do the same as I did with the hard disk memory and at least see the RAM that I have had allocated:

node: pve
ram_allocations:
  total_allocated_gb: 69.21
  vms:
  - vm_name: pfFirewall
    vmid: 201
    allocated_gb: 2.0
  - vm_name: kaliMachine
    vmid: 202
    allocated_gb: 8.0 
    
    ....

Now, I think it is time to get into the configuration of network and how to document that!

Documenting the Network

From this point, I had done all of my documentation from my macbook. However thanks to connecting to the same switch as my server, I have been able to do all of the previous scripts without a VPN. However for documenting the network, I am going to need access to my pfSense firewall which is hosting on a virtual machine “inside” my server.

The solution here is a VPN to access the pfsense instance inside of the server. As I covered in a previous blog post, I had set up tailscale to connect to my homelab network, but this was using the pfsense router that was “outside” of my server. Sound confusing? Don’t worry, cause I am confused too, but thats what documentation is for. Looking at our logical and physical topologies should help us out here:

You can see that I have a pfsense router that connected to a powerline adapter in my house. This then provides the internet connection for my proxmox machine.

I could just get rid of this headache by just running these documentation scripts from inside of my server, on say my Kali linux machine. However I wanted to be able to administrate this all from my mac book for two reasons:

Everything I do on my homelab is managed through the proxmox GUI that i run on this macbook,
Its always more comfortable to run scripts and create them on my macbook, and I personally did not want to have to commit to my github just to verify a script was working, or worse: have to write the script through a remote connection on a Kali Linux machine.

So, I was wondering if I could set up tailscale on the pfsense firewall that is INSIDE of my server! To see if that would let me access the pfsense firewall and generate my documentation. To do this properly. we have to jump through a few hoops. I have found that with working through pfsense, to access the admin panel you need to do so through the gateway IP for a subnet.

I won’t discuss the setting up of tailscale for pfsense here, I have done it before and there are various resources which explain that better than I could, see this video for further info. Rather it is just important that I broadcast a subnet that I want to be able to access the pfsense firewall from as a gateway. In this case I will broadcast the native subnet:

Its notes like these which can’t be automated and are super important for recalling WHY the infrastructure is set a certain way. So notes like that will also go into my github repository, under the documentation directory.

Networking Scripts

Now that we can access the pfsense server that we want through tailscale, we should be able to connect to it using our macbook from the IP address.

To this point, most of the scripts I have written have been written were designed with extensibility in mind. If you wanted to use the scripts on your own server, all you would have to do is change some variables in your .env file and the directory paths you wanted the YAMLs to be stored in. However, my network documentation scripts might not be as applicable to your own setup. Regardless, I am certain you can take pieces of it and apply it to your own homelab if you so desire!

Now to interact with our pfsense firewall to be able to get the stats from it, we need to utilize the following unofficial github package by Jared Hendrickson. This API package is not officially supported by pfsense, but supposedly it is a great tool! Make sure that you are running pfsense community version 2.8.0 or above if you are doing this on your own homelab.

After installing the package, we are able to configure the API and generate an API key, also note that we can set the interfaces that accept the API calls:

Doing so will allow us to specify what we want to access. For now I am going to allow the LAN interface to accept API calls, so we can get the information that we need for initial configuration.

Now, that everything is set up on the configuration end. It is time to get back to programming, since we are dealing with a REST API. I went ahead and configured a custom class using the requests library in python, as the pfsense API requires that an API key is generated and passed in the header, and I didn’t have to do that more than once With that set up, I can finally get to accessing our endpoints and going through some data! Whats amazing about this API is the sheer amount of endpoints we can access, just look at this list and how well documented it is. I feel pretty confident that I will not only be able to get all the data I want from this API, but also leverage this API in the future for some automation tasks if needed.

So what do I want to be able to get in terms of configurations that I will show on my Github?

Information about my interfaces (LAN and VLANS)
Firewall rules

In time this list will likely expand as I do more projects and I’m certain it will be relatively straightforward to do so now that we have the set-up out of the way.

Before using python requests to reach out to the API, I decided to test some endpoints with curl to make sure they worked properly. There were some issues, specifically that the endpoints that were shown in the documentation were redirecting, calling with -L with curl solves this. If using the requests library in python, this is done automatically.

Finally, I created a script that got me information about each of the interfaces on my pfsense routers, including VLANs and their allocated address ranges:

opt1:
    interface_info:
      id: opt1
      description: VLAN10
      ipv4_address: 10.10.10.254
      subnet: 24
      physical_interface: vtnet1.10
      enabled: true
      ipv4_type: static
      gateway: null
      block_private_networks: false
      block_bogon_networks: false
    dhcp_config:
      enabled: true
      range_from: 10.10.10.50
      range_to: 10.10.10.100
      dns_servers:
      - 10.10.10.254
      - 8.8.8.8

This is awesome for me, as sometimes when assigning an IP address on a new VLAN, I forget the ranges I set! I also went ahead and created another script that generates information about the virtual bridges that I have running on proxmox:

... 

generated_at: '2025-10-01T08:06:18.315785'
proxmox_node: pve
virtual_bridges:
  vmbr5:
    name: vmbr5
    type: OVSBridge
    priority: 6
    active: true
    cidr: null
    bridge_ports: ''
    comments: Network TAP for vmbr0 for so
    autostart: true
    method: manual
    gateway: null
 
 ...

Certainly I am likely to add more, but this blog post is getting quite long, so time to put a cap on it. If you would like to see more, please go ahead and checkout the github repo page for it.

Moving forward!

What is next for adding homelab documentation? Besides adding a few missing details about my homelab set up and past documentation I have, the big question I have is: “How could I make this even more “self-documenting”?”. Well I would want these files to update every time I push to this homelab repository. As every time I push, I am likely making a manual update to something that I want to document. Seems like the perfect place to slot in a GitHub Action!

Further, currently my code does not recognize if nothing has changed between configurations and thereby that no change should occur. If each file is getting fully rewritten each time, our changelog isn’t as precise on github. This is something else that I plan on working on in the future. Right now, its time to start adding in documentation about what I have been doing on my homelab, and adding notes for each of the machines! Until then 🙂