Goal: one server class machine providing multiple student seats with a dedicated GPU for each seat.
Also tried hand rolled Linux + KVM, Unraid, Proxmox - several worked, but not reliable. ESXi, once working, has been solid. Also had issues with USB passthrough setup.
ACS & PCI Lanes
- be careful of many Intel processors, including Xeons; many have relatively few PCI lanes, meaning they’re shared between devices. Devices on shared PCI lanes can only be passed-through to the same VM
- you also need properly working and setup ACS
- most of the virtualization systems have quirks with USB. E.g. unable to dynamically allocate. Or will pass through everything except HID (keyboard, mouse)
- many people end up using virtualhere
- XHCI (USB 1, 2, 3) is (probably) your enemy if you want to passthrough chipset/motherboard USB to VMs
- make sure to configure USB to be EHCI (ie USB 1, 2)
- our X99 chipset based motherboard present only one XHCI controller, and it’s ineligible for pass through. Set up as EHCI, present two controllers, both of which can be passed through independently to different VMs
- XHCI typically supports far fewer endpoints (remember one USB device can have multiple endpoints)
- USB 3 endpoints require 2x memory c.f. USB 2
- this really affects larger multiseat setups
- lots of interesting details in this discussion thread
using version 6.5 due to problems passing through USB cards to VMs- been fixed!
- Article with instructions, VMWare KB re disabling native drivers
- steps from admin shell:
- verify available AHCI packages:
esxcli software vib list | grep ahci
- disable the ESXi native AHCI driver:
esxcli system module set --enabled=false --module="vmw_ahci"
- verify it’s disabled in config:
esxcli system module list | less
- reboot ESXi server, verify it’s not loaded (repeat last command)
- verify available AHCI packages:
to get USB PCI Card passthrough to fully work (not just be indicated as working in the WebGUI):
- disable XHCI, enable EHCI
- may need to disable
vmkusbnative usb driver and use legacy driver. Similar steps to
AHCIdriver above. KB article
- Note: tried switching to legacy driver, didn’t work. Disabled XHCI and everything started working. Tried reenabling
vmkusbbut the system refused. Don’t know if
vmkusb+ EHCI will work.
- Don’t need to fiddle with
passthru.map. That’s for indicating if individual functions of multifunction cards can be assigned to separate VMs, and how to reset them. E.g. a NIC card with two sockets: can you assign each socket to separate VMs, such that traffic to one is not visible to the other; that changing the mode of one doesn’t affect the other, etc. Details.
don’t extend the local datastore across the two SSDs by adding the 2nd one as an extent; there is no load balancing across extents. Instead, create a second datastore and take your best guess as to which drive to store VMs backing files.
- eg assign workstation 1 & 3 to one drive, and 2 & 4 to the other.
if datastore not removable, check to see if swap is enabled and set to the datastore. Also search the advanced settings for
logand make sure they’re not set to the datastore. If
scratchis, you have to set it to a valid path- eg
install a single guest VM, and assign it the motherboard USB controllers. Check to see if ESXi boot USB drive appears in the guest…
- for this reason + issues with external flash drive reliability, installed ESXi on a small internal flash drive. Counter to published best practices, but it both stopped the boot-drive-appearing-in-Guest and the reliability issues.
Adding Dedicated Network Links
- eg dedicated to NFS traffic
- On ESXi (web) console, go to
- Add a new virtual switch (if needed)
- Add the (dedicated) physical NIC as the uplink
- Add a new port group (that uses the new vswitch)
- Add a new kernel NIC
- configure the kernel NIC with the appropriate IP settings
How to Upgrade
- enable maintenance mode
- enable ESXi Shell & ssh
- See available official baselines:
esxcli software sources profile list --depot=https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml
- cut and paste output into a spreadsheet, sort, and look carefully at versions!
- Dry run test:
esxcli software profile update -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml -p <baseline to update to> --dry-run
- run upgrade:
esxcli software profile update -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml -p <baseline to update to>
- cribbed from: https://tinkertry.com/easy-update-to-esxi-67, https://neosmart.net/wiki/upgrading-vsphere-esxi-from-the-command-line-automatically, https://tinkertry.com/easy-update-to-latest-esxi
- Reboot and hit
Shift-R, enter recovery mode, follow prompts. From virtubytes.
VM Management and Setup
Editing Config Files
Finding the VMWare Tools ISO
- From https://www.altaro.com/vmware/update-vmware-tools-package-esxi/
- this will disappear if you upgrade using a baseline that does not include the VMWare tools; you can manually download the support files from here
- official KB article
Moving and Cloning VMs
vmkfstools; OS level
cpand similar may expand “thin” provisioned disks to their maximum potential size
- refs: serverfault, cloning VMs via command line,
Removing VMWare SVGA Adapter
- once GPU passthrough is working, have seen some big performance hits when the VMWare Video adapter driver is sending a copy of the screen content to the ESXi console. Sometimes random, and happens even if there is no remote console active.
- disabling in device manager doesn’t always fix this
- removing the driver reverts the driver to being the MS generic one. But still see random performance hits.
- VMware Tools updates may reinstall the driver
- most effective fix is to remove it from the VM. Shutdown the VM and using the advanced config, set
VMs Randomly Lock Up
- The USB PCI passthrough appears to fail once the VM is asleep; so there’s no way to wake up the VM(!)
- check the power management settings in the VM
- Windows defaults to going to sleep after ~15 minutes
- solution is to use the high performance energy profile in windows or adjust the active power profile to be always on.
VMs Showing Most Hardware as ‘Removable’
- Windows guests not only allowed users to eject their USB flash drives, but also the drive controller, etc.
- ESXi removes the item from the running VM, and removes it from the config- the removal becomes permanent (until you re-add it from the ESXi console.)
- power off VM (so config can be edited and saved)
- From ESXi console -> VMs -> VM to change -> settings -> VM Options -> advanced -> Edit configuration
devices.hotplugand set to