PVS Server Memory Sizing
Citrix PVS server will use the system memory as a cache for the streaming operation. It is recommended to allocate enough memory to the server to get the best streaming performances by caching the virtual disks into the RAM. I use Sysinternal RAMMap on the PVS servers under load to find the maximum cache-in-memory value for each virtual disks file (in the “File Summary” tab). For the memory sizing, I’m using a modified formula from the Citrix PVS blog:
(Recommended Memory + (#XenApp_vDisks* 4GB) + (#XenDesktop_vDisks * 3GB)) + 5% buffer
So in example, for a Windows 2012 R2 PVS server streaming two XenApp (Windows 2012 R2) and seven XenDesktop (Windows 7 x64) virtual disks, the memory sizing formula is as follows:
(2 + (2*4) + (7*3))*1.05 = 32.55 GB of memory per PVS server
In this scenario, the PVS servers is configured with 32 GB RAM.
PVS Ports and Threads configuration
The base formula for configuration the number of ports and the number of threads is:
“# of ports” * “# of threads/port” = “max clients”
The number of threads should match the number of virtual CPUs on the server, making it a constant value in the formula. In our scenario, the PVS server will stream virtual disks to 500 targets with 6 virtual CPUs. Note that “# of ports” can only be an integer. This gives us the following formula: {84 * 6 = 504}. The PVS server will be configured with 84 listening ports (6910 to 6094) and 6 threads per port for a theoretical 504 maximum concurrent streaming targets.
PVS MTU configuration
The default MTU size on the PVS server is set to 1506 which correspond to: the payload size + UDP header (L4) + IP header (L3) + Ethernet header (L2).
1464 (payload) + 8 (UDP header) + 20 (IP header) + 14 (Ethernet header) = 1506
To prevent PVS traffic from being rejected or segmented, the PVS server network interface used for the streaming traffic should have a Layer 3 MTU size greater than the PVS Server MTU size without the Ethernet Header. In our case, this correspond to: 1464+8+20= 1492. Usually, the default Layer 3 MTU size on Windows Server is {1500 bytes}. To check the layer 3 MTU setting on the PVS server, the following command should be used: “netsh interface ipv4 show subinterface“. The global default MTU size including the Ethernet header is usually {1514 bytes} (Cisco Networking Infrastructure). To check the global MTU size the {ping} command should be used with a payload corresponding to the PVS Server payload. Note that the ICMP header has the same size as the UDP header so the payload doesn’t need to be adjusted to make the test. For example you can ping the gateway with a specific payload using the following command: “ping {gateway IP} -S {PVS Streaming IP} -f -l 1464″. The “-f” parameter prevent the packet from being segmented. If the packet is segmented then the following error is returned: “Packet needs to be fragmented but DF set“. This is the sign that the packet will be fragmented and the payload is too large for the current network infrastructure.
PVS Burst Size configuration
Only 32 packets can be sent over the network in a single IO Burst. 2 packets are reserved for the traffic overhead. Therefore the I/O burst size formula is the following: (MTU Payload * 30). In our case: 1464 * 32 = 43,920. The IO Burst Size will be set at 44 Kbytes.
PVS I/O Limit configuration
In my personal benchmarks, settings the concurrent I/O limit (transactions) slider to “0” had a negative impact on the overall streaming performance. Setting the slider to “128” for both “local” and “remote” concurrent I/O limit had the best performance output.
PVS Buffers per threads configuration
The number of buffers per threads is calculated with the following formula: (IOBurstSize / MaximumTransmissionUnit) + 2). The IOBurstSize is obtained with the formula (MTU Payload * 32). So in our example the formula is ((1464*30)/1464)+2. The number of buffers per threads is set to “32“.
PVS TCP/UDP Offloading
Large Send Offload and TSO Offload are responsible for re-segmenting TCP and UDP packets which is not properly supported by the Citrix PVS streaming traffic. Disabling TCP and UPD Offload on Windows Server 2012 R2 and on the network card interface driver options did not improve the network performance in our case, but had a negative impact on the CPU consumption under load. This settings should be tested for each deployment.
PVS Network dedicated NIC configuration
The best network performance was achived by disabling all network layers except IPv4 on a dedicated network interface, and disabling all WINS and NetBIOS listeners:
PVS targets RAM Cache sizing
In order to define the size limit of the RAM cache to prevent Boot and Logon performance issues, the target cache is redirected on the PVS server and the size will be monitored 10 minutes after the initial OS Boot, and 10 minutes after a user logon. Result: The Operating System (Windows 7 x64) will use the cache up to an average of 512 MB at boot time. This value should be set on all vDisks to reduce OS Boot storms effects in the VDI infrastructure. When a test user logs on, the cache will quickly fills up to an average of 1024 MB of cache. This value should be used on all vDisk to avoid OS Boot AND Logon storms on the VDI infrastructure by redirecting the I/O into the target’s RAM.
PVS vDisks defragmentation
Defragmenting the master vDisks offline after a merged base is a best practice for PVS environments. The merged vDisk (VHD file) is mounted on the PVS server as a disk drive (via Windows “mount” built-in command, from the right click menu). The disk is then defragmented via the native defragmenter tool from Windows (disk tools), but also using a third-party application named “Defraggler”. The latest showed slighly better results. The gain after a disk defrag was around 10 MBps in read speed using a benchmark tool on the PVS target.