Last year Citrix released two blogs about the (relatively) new “cache in device RAM with overflow on hard disk” feature:
In these blogs Citrix talks about the performance gain you have when configuring the write cache in RAM with overflow on disk. It was already possible to configure the write cache in RAM (without overflow) but the downside to this write cache type was that you could not configure the amount of RAM used for this. When RAM was completely full the target device would have a blue screen of death. The memory used for write cache is not available for the system and not given back when freed. Also, the new driver architecture created for the new cache to RAM write cache type appears to be a lot faster than the old cache to RAM write cache type.
In the blog posts Citrix already explained everything in detail and analyzed all the numbers. I thought it would be interesting to reproduce some of the tests myself and compare the results with the performance in an existing (6.1) PVS environment that uses the cache on device hard drive cache type.
For the tests I considered the IOPS measured with IOMeter and boot time of the target device. The tests were performed on two Windows Server 2008 R2 target devices running XenApp 6.5. Server1 connects to a PVS 6.1 server with cache on device hard drive. Server2 connects to a PVS 7.6 server with cache in device RAM with overflow on hard disk. Two identical vDisks were automatically created with the use of our deployment tool (called Raido Taskflow). The only difference between the vDisks is the target device agent software version to connect to the respective PVS server. All servers are virtually running on a VMware ESX hypervisor. Both PVS servers and both target devices have the same resource specifications:
PVS Server Specifications:
- 8 GB RAM
- 2 vCPU
- 1 vDisk store on local disk
Target Device Specifications:
- 16 GB RAM
- 4 vCPU
I used IOMeter with the same configuration as described in the Citrix blog part 2:
- 4 workers
- Depth Queue set to 16 for each worker
- 4 KB block size
- 80% Writes / 20% Reads
- 90% Random IO / 10% Sequential IO
The only things I configured differently were:
- The test file size of 2,5 GB (which corresponds with 5242880 sectors in IOMeter)
- A test time of 10 minutes.
Test 1: PVS 6.1 server Cache on Disk
The first test is performed on the PVS 6.1, server cache on disk. The IOMeter copies a file (iobw.tst) of 2,5GB to c:\. You can see I/O’s are hitting the disk because the .vdiskcache on the target device’s persistent disk is growing larger.
Since this is a single large file the disk performance was reasonable at a stable 35 MB/s.
this was done, the actual IOMeter test began. Four threads are performing I/O‘s, stressing the system’s disk. Immediately you can see the disk performance is a lot worse. Disk performance was only hitting momentary peaks of 4MB/s
results of the test are displayed below:
As you can see, the results are not that great. CPU is definitely not the bottleneck here.
Test 2: PVS 7.6 server vDisk 3GB RAM cache with overflow
The next test is performed on the new PVS 7.6 cache in device RAM with overflow environment. The size of the RAM cache is configured at 3GB. This way we ensure that all testing is done in RAM and nothing overflows to disk. Further, the same IOMeter configuration is used as in test 1.
When IOMeter was preparing the disk by copying the iobw.tst file to C:\ it was so fast that I didn’t have the time to create a screenshot. During the test it became clear that throughput would be a lot higher than in test 1 (4 MB/s versus 486 MB/s).
As you can see, D:\.vdiskcache does not exists anymore with this write cache type and is replaced by vdiskdif.vhdx. Because there are no I/O’s hitting the persistent disk, D:\vdiskdif.vhdx is not mentioned in the above screenshot. . During the test vdiskdif.vhdx kept its original size of 4.096 meaning no overflow to disk happened. When the test was done, the results were as follows:
That’s pretty amazing if you ask me! You can also see the CPU is more heavily used in this test.
Test 3: PVS 7.6 server vDisk 1GB RAM cache and 2.5 GB test file
Now I did the same test as above but configured the vDisk to only use 1GB RAM Cache. When IOMeter started disk preparation (copying the iobw.txt file to C:\) performance was initially very high but as the 1GB cache limit was reached the I/O’s started spilling over to the persistent disk. Performance dropped dramatically.
Another thing that seems to be improved with using RAM cache is the boot time.
Boot time of a PVS 6.1 target device with write cache on disk:
Boot time of a PVS 7.6 target device with write cache (3GB) in RAM :
That’s almost a 20 seconds difference!
I’ve consolidated the results in the table below:
|Test result||PVS6.1||PVS 7.6 3GB Cache||PVS 7.6 1GB Cache|
|Total I/Os per second||459.13||123484.80||717.55|
|Total MBs per second||1.88||506.79||2.94|
|Average I/O response time (ms)||139.38||0.52||89.18|
|% CPU utilization (total)||1.59||66.20||1.96|
|Boot time (seconds)||45||24||/|
There is no doubt that using RAM cache with a PVS 7.6 target device brings great improvements to performance. It gives performance to your target device but also potentially to your storage subsystem because I/O’s are offloaded to memory. For Best performance size RAM cache large enough so all I/O’s are performed in RAM. When you have little memory to spare, chances are that the system would claim some of the write cache memory. This could hit performance but still could dramatically lower I/O’s performed on your storage subsystem. Besides, the risk of running out of memory is very low. I would strongly advise to upgrade to this new write cache type even If you have little memory to spare but as always test and monitor your configuration first!