Storage Protocol Choices & Storage Best Practices for VMware ESX

By admin, October 15, 2010 11:33

Just found a great article for ESX Storage Best Practices from Cisco, definitely worth reading for understanding how storage really works in VMware vSphere

At the end, it also mentioned the future: VAAI, well the paper was written in 2009 and one year after, we are already using it in our Equallogic SAN. :)

There is even a section called “Day After Tomorrow”, future technologies like vMotion between Datacenters, DRS and DPM for storage, etc.


By admin, October 14, 2010 15:24

談起法拉利的代表作﹐288 GTO, F40, F50, ENZO它們四個肯定是名列前茅﹗

288 GTO, F40, F50, ENZO四個在超跑界響當當的名字﹐它們代表了紅色的激情﹑意大利式的浪漫和無數人嚮往的羨慕眼光。

終于等到了這期英國Classic & Sports Car雜誌今年的重頭炮﹕FASTEST EVER FERRARIS 288 GTO, F40, F50, ENZO﹐這期絕對是一本值得收藏的刊物。



Finally surrender to Smart Phone – iPhone 4

By admin, October 14, 2010 12:21

In the past few years, there have been numerous time I’ve seriously considered buying a smart phone like HTC, iPhone, but the technology wasn’t really ready yet by then.

However things changed with the raise of HSPDA and particular the launch of iPhone 4 early this year.

It’s time to CHANGE finally!


Today I’ve got an iPhone for me finaly…technically speaking, it’s not for me, but actually it’s for her. Compared many plans from different service providers in Hong Kong, finally located SmartTone mainly due to it’s network coverage and reliability and HK$398 unlimited plan is the best to go for a 32GB version. It took about 1 week to order and the customer service is excellent!

In additional, SmartTone embedded something called X-Power which allows you to view flash movie on Youtube and many other web sites that’s definitely a PLUS, as for normal users they really have no idea how to “JailBreak”.

For me, a network guy, I found the most useful tool is WYSE PocketCloud which provides the best RDP mouse control ability among its category and I am going to manage the whole data center through a little device like iPhone 4, wow…that’s awesome really! Basically this is ONE AND ONLY reason I surrender to iPhone, nothing else really.


Probably I will try Dropbox later which allows me to synchronize MP3/documents between my desktop and iPhone, but putting sensitive information on their server will be a big concern for many.

Finally, the 3G roaming service is still very disappointing, NO UNLMITED usage when I go to other countries even SmartTone/3/One2Free provides things called HK$168 daily UNLMITED PLAN with HIDDEN clauses that you can ONLY use email and browse web pages, and anything else like RDP/VPN/FTP/SKYPE will cost you HK$0.01/KB, so it’s absolutely USELESS when traveling aboard, only Wi-Fi works, but again why do I need to buy an iPhone without using 3G then? It’s like driving a HK$3M Ferrari on country road only and you are not allow to drive on highway.

Mouse is very slow in Windows Server 2008 R2 under ESX 4.1

By admin, October 12, 2010 13:04

Basically, all you need to update the SVGA driver to WDDM driver, but why didn’t VMware include that in its latest VMware Tools?

Troubleshooting SVGA drivers installed with VMware Tools on Windows 7 and Windows 2008 R2 running on ESX 4.0

WDDM and XPDM graphics driver support with ESX 4.x, Workstation 7.0, and Fusion 3.0

Solution to Dell OEM Windows Server Requires Re-Activation in ESX 4.1

By admin, October 12, 2010 09:26

So you have been there and encountered that annoying thing, you’ve called Dell Pro-Support and they replied you there is DEFINITELY NO WAY and you also called Microsoft, finger pointing back to Dell by asking you to contact Dell directly as it’s OEM product. You have asked local Microsoft distributor, they also said there is no way to do it, you have to buy Box set or Open License, your existing Dell OEM license will not allow you to reactivate using the key printed on it.



  • Dell’s Pro-Support is unprofessional in this case.
  • Microsoft is responsible for its own product, NOT!
  • Local Microsoft Distributor wants you to pay more, huh?


This is the Official Solution from Dell, hope it’s useful for others, the key point is to use Virtual Key to re-activate, and then either activate on-line or use phone to activate again and finally clone it as master gold image for further deployment.

You cannot automatically pre-activate the Windows Server 2008 operating system installed on VMs by using the product activation code in the Dell OEM installation media. You must use the virtual product key to activate the guest operating system. For more information, see the whitepaper Dell OEM Windows Server 2008 Installation on Virtual Machines using Dell OEM Media at


I always thought Virtual Key is for Microsoft’s own Hyper-V only and cannot be used in VMWare enviornment, but I was wrong.


Alternatively, you can force the VM to load the default BIOS containing DELL SLIC 2.1 (supports Windows 7 and Windows Server 2008 R2), which will trick the VM thinking it’s actually a PHYSICAL DELL server.

1. Simply add bios440.filename = “DELL.ROM” to VM configuration parameters by using VC Client, of course you do need to upload DELL.ROM to your VM directory and please don’t ask me where to get that DELL.ROM, goole it around yourself. One draw back is this VM can’t be vMotioned around, as the DELL.ROM won’t get vMotioned. (Update: Solution to vMotion is to put DELL.ROM on every host or simply on SAN such as bios440.filename = “/vmfs/volumes/san/DELL.ROM”. :)

2. Very importantly, you will also need to find the corresponding certificatedell.XRM-MS, then use slmgr.vbs -ilc c:\dell.XRM-MS to import the certificate.

3. Insert the Key by slmgr.vbs -ipk XXXXX-XXXXX-XXXXX-XXXXX-XXXXX


Finally, some say even by adding SMBIOS.reflect = True will work, but I COULD NEVER get this method working!

Update: The reason I didn’t get it working is because I didn’t use Dell’s W2k8R2 installation disk, see this link from IBM, sounds so simple! Really?


Edit the virtual machine’s .vmx file to contain the following line:

SMBIOS.reflectHost = “true”

Note: Encoding of the text added to the .vmx file must be in UTF8.

This updates the virtual machine BIOS with the IBM Original Equipment Manufacture (OEM) information required to use IBM-provided Operating System (OS) installation media.

IBM-provided Microsoft Windows 2008 media must be “BIOS Locked” to ensure that the OS will only install on IBM hardware. Virtual machines use a virtual BIOS that does not contain information that identifies the system as being manufactured by IBM.

The installation of Microsoft Windows Server 2008 from IBM OEM media to such a virtual machine will fail until the virtual BIOS has been updated to include this information. Alteration of the virtual machine’s .vmx file to state SMBIOS.reflectHost = “true” performs this function for servers using VMware’s ESX/ESXi technology.

The workaround resolves this issue by using media that is not locked to a specific OEM.

The solution resolves this issue by adding IBM information to the virtual BIOS.

Update Apr-16

Tried again today, the method SMBIOS.reflectHost = “true” is DEFINITELY NOT working! Even loaded with Dell’s OEM w2k8r2 std installation disk and the server is Poweredge R710, it still asked for activation. In additional, I discovered I can install Dell’s OEM w2k8r2 std disk on VM even without SMBIOS.reflectHost = “true”, so this means Dell’s w2k8r2 disk can be used on a non-Dell server.

So only the above two methods are working, but not the last one, if you got the last one working, pls drop me a line, thanks.

Update Apr-17

May be the answer is SMBIOS.reflectHost = “true” WILL ONLY WORK for ESX 3.5 or before, as VMware’s KB didn’t indicate this method apply to ESX 4.0.

ESX 4.1, VM is Version 7, VMware PVSCSI and VMXNet3, Safely Remove Hardware?

By admin, October 11, 2010 11:15

After upgrading to ESX4.1, my VM with latest Version 7, VMware PVSCSI and VMXNet3 starts to show “Safely Remove Hardware” alert in tray, but why would you want to remove your harddisk and NIC? Huh?

Then I found this useful link, case solved!

To Thin or Not To Thin? On Equallogic and/or ESX Datastore?

By admin, October 10, 2010 12:42

About a month ago, I was told by an experienced Dell Equallogic Consultant to use Normal (non-thin) on EQL array and Thin on ESX VMFS, I wasn’t exactly sure what did he mean by then.

So I did a simple test on my EQL box:

Create a 10GB volume (non-thin), attach to Windows, write 5GB, then remove 4GB, leave with 1GB, to EQL it’s 5GB used.

Then I write another 4GB, EQL still reports 5GB, then I write 1GB more, now EQL reports 6GB.

However in my Thin Provisioning test for the above same 10GB, case looks completely different now.

Create a 10GB volume, attach to Windows, write 5GB, then remove 4GB, leave with 1GB, to EQL it’s 5GB used.

Then I write another 4GB, EQL somehow EQL volume reports the size continously growing to 5GB, then 6GB, then finallyBbang 9GB. WHY? WHY doesn’t it use the UNUSED SPACE? (actually inside to Windows, it’s still 5GB, you will see later)

HOWEVER, Please note THIS, as I continue to add another 4GB to the volume (now EQL reports 9GB, windows reports 5GB), then EQL reached 10GB max (somehow the volume didn’t go offline? why? I don’t know), but I can still add this 4GB to the volume, and windows reports 9GB/10GB used.

So in a strange way, even EQL reports the volume has been fully used, we can still add data to it at Windows level, but it’s just TOTALLY CONFUSING and false volume is going to full alarm all the way when using Thin Provisioning.

That’s why WAIT UNTIL FW5.0.x or FW5.x coming out with the REAL THIN RECLAMATION feature like what HDS’s or 3PAR’s did a year ago. (Yes, EQL is behind in this particular area) 

We are probably better to NOT USE Thin Prov. in ESX, what I mean is to

Use Thick Prov. in EQL, but Thin in ESX VMFS would be the best way.

For snapshots, just set it to a smaller % during the volume creation (10% would be good, as you can always grow it later), this apply to the volume as well, make your own Thin Provioning, just set the volume to a smaller size when you first create it, then gradually expand it as you need later, then you won’t waste a lot of space from the beginning.



I did another test and it proved I was wrong above.

The GB are reported in EQL Group Manager under Volume Used Size

  Thick (20GB) Thin (20GB)
1. +5GB  5GB  5GB
2. +5GB  10GB  10GB
3. -5GB  10GB  10GB
4. +5GB  10GB  10GB
5. -10GB 10GB  10GB
6. +15GB 15GB  15GB (Warning as over the default 60%)
7. -5GB  15GB  15GB
8. +5GB  15GB  15GB

So we are safe to use Thin Provisioned VMFS now I think. 

Btw, I also received a reply from EQL indicating they are working on the Re-Thin feature.

In response to “reclaiming unallocated array disk space” on the PS Series arrays:

An enhancement request for this feature (reclaim space that was previously used) has already been submitted.  Firmware version 5.0.2 does not introduce this feature.  Engineering has not updated support as to when such a feature will be available in future firmware releases.

Finally, I looked into details about Hitachi HDS’s Re-Thin feature, a 3PAR guy points out HDS’s Re-Thin in fact is actually a…Migration and the Zero Out the unused blocks, but in 3PAR, they can REALLY, I MEAN REALLY do the Re-Thin in real time, no need to copy the volume to another copy and then zero out the unused block. I do hope Equallogic can have this kind of feature instead of a “Not so real” Re-Thin like HDS ones.


Oct 14, 2010 Some update from Dell Pro-Support regarding NTFS/VMFS can REUSE the touched blocks somehow.

A similar problem is when the initiator OS reports significantly more space in use than the array does. This can be pronounced in systems like VMWare that create large, sparse files. In VMWare, if you create yourself a 10GB disk for a VM as a VMDK file, VMWare does not write 10GB of zeros to the file. It creates an empty (sparse) 10GB file, and subtracts 10GB from free space. The act of creating the empty file only touches a few MB of actual sectors on the disk. So VMWare says 10GB missing, but the array says, perhaps, only 2MB written to.

Since the minimum volume reserve for any volume is 10%, the filesystem has a long way to go before the MB-scale writes catch up with the minimum reservation of a volume. For instance, a customer with a 100GB volume might create 5 VMs with 10GB disks. That’s 50GB used according to VMWare, but only perhaps 5 x 2MB (10MB) written to the array. Until the customer starts filling the VMDK files with actual data, the array won’t know anything is there. If has no idea what VMFS is; it only knows what’s been written to the volume.

• Example: A file share is thin-provisioned with 1 TB logical size. Data is placed into the volume so that the physical allocation grows to 500 GB. Files are deleted from the file system, reducing the reported file system in use to 100 GB. The remaining 400 GB of physical storage remains allocated to this volume in the SAN.

� This issue can also occur with maintenance operatiions including defragmentation, database re-organization, and other application operations.

In most environments, file systems do not dramatically reduce in size, so this issue occurs infrequently. Also some file systems will not make efficient re-use of previously allocated space, and may not reuse deleted space until it runs out of unused space (this is not an issue for NTFS, VMFS).


Update Oct-15-2010

If you ask me again now, I would say THIN PROVINTIONING (aka TP) ALL THE WAY, both on Equallogic AND on ESX Datastore is the BEST way to go and it is going to be the trend in storage management world I think, especially if Equallogic will release it’s upcoming Re-Thin or Space Reclaim feature in coming 5.x firmware update. (So far only 3PAR is able to do it I think)

Update Sep-3-2011

Storage APIs for Array Integration (VAAI) has been enhanced to reclaim blocks when a virtual disk is deleted, unlike previously where the storage array is not aware about deleted blocks contains data after deleting virtual disks.

There is a new feature in vSphere 5.0 that may finally solved the problem, but is this only going to work in vSphere 5.0? I really do hope ESX 4.1 can also get this VAAI enhancement after upgrading the Equallogic firmware with such thin provisioning reclaim capability. 

Currently, the only way to reclaim a thin provisioned volume (TP) in Equallogic is to Storage VMotion all existing VMs to a new TP volume and then delete the existing one.

Impressive Equallogic PS6000XV IOPS result

By admin, October 9, 2010 11:41

Impressive Equallogic PS6000XV IOPS result

I just performed the test again 3 times and confirmed the followings, this is with default 1 Worker only, IOmeter testing using VM’s VMFS directly, no MPIO direct mapping to EQL array, VM is version 7, Disk Controller is Paravirtualized and NIC is VMXNet3.
SERVER TYPE: VM on ESX 4.1 with EQL MEM Plugin, VAAI enabled with Storage Hardware Acceleration
HOST TYPE: Dell PE R710, 96GB RAM; 2 x XEON 5650, 2,66 GHz, 12 Cores Total
STORAGE TYPE / DISK NUMBER / RAID LEVEL: Equallogic PS6000XV x 1 (15K), / 14+2 600GB Disks / RAID10 / 500GB Volume, 1MB Block Size
SAN TYPE / HBAs : ESX Software iSCSI, Broadcom 5709C TOE+iSCSI Offload NIC

TEST NAME——————-Av. Resp. Time ms——Av. IOs/sek——-Av. MB/sek——

Max Throughput-100%Read……4.1913………13934.42………435.45


Max Throughput-50%Read………5.5166………10240.39………320.01


EXCEPTIONS: CPU Util. 67.82, 38.12, 56.80, 40.2158%;

RealLife-60%Rand-65%Read 4051 IOPS is really impressive for a single array with 14 15K RPM spindles!

Windows Server Backup in Windows Server 2008 R2

By admin, October 8, 2010 13:17

I’ve been using Windows Server Backup in Windows Server 2008 R2 for almost a month, and found it can do everything Acronis True Image Server does, seemed there is really no need to buy AIS in the future in my own opinion.

See what WSB can offer you (my requirement list):

  • WSB works at Bloack Level, so backup or taking snapshot is very fast during backup and restore as well.
  • Full server baremetal backup.
  • Full backup at first time, and incremental afterwards, Excluding files during backup, Compression
  • System State integrated with AD, so you won’t get a crash consistant state that you have your server restored, but found AD cannot be started.
  • Individual folder/file restore
  • Backup to network shared folders (there is a limit that you cannot have incremental copies, but only keep ONE copy, the later one will over-write the previous one), this does suck badly! However I don’t use network folder to store my backup, so it’s fine for me.
  • Maximum 64 copies (in my term, it’s almost 2 months since I only have 1 schedule backup running) or limited to your backup disk size
  • The backup copies are hidden from the file system, in TIS, you need to create a partition to hide the backup copies. (Acronis Secure Zone)
  • WSB can backup Hyper-V vm images as well.

Best of all Windows Server Backup (WSB) in Windows Server 2008 R2 IS FREE! TIS is over USD1,200, I don’t need any features like Convert to VM, Universal Restore, Central Management, so WSB works perfectly for the standalone server.

Finally, you may ask what about the rescure bootable DVD/CD-ROM? While you don’t have one, what? Yes, the Windows Server 2008 DVD is your optimate rescure bootable DVD, fair enough? :)

A Possible Bug in VMware ESX 4.1 or EQL MEM 1.0 Plugin

By admin, October 8, 2010 00:37

This week, I encountered a strange problem in redundancy testing, all paths to our network switches, servers and EQL arrays have been setup correctly with redundancy.  Each ESX Host iSCSI VMKernel (or pNIC) has 16 paths to EQL arrays and we tested every single possible failure situation and found ONLY ONE senerio doesn’t work. See When Power OFF Master PC5448 Switch, we are no longer able to ping PS6000XV.

After two days of troubleshooting with local Pro-Support as well as US EQL support, we have narrowed down the problem to “A Possible Bug in VMware 4.1 or EQL MEM 1.0 Plugin”.

During the troubleshooting, I found there is another Equallogic user in Germany is also having similar problem not exactly the same though (See  Failover problems between esx an Dell EQL PS4000), as he’s using PS4000 and only have two iSCSI paths, and his problem is more seroius than ours.

Oct 5, 2010 3:16 AM
Fix-List from v5.0.2 Firmware:
iSCSI Connections may be redirected to Ethernet ports without valid network links.

Also he’s problem is similar as whatever iscsi connection left in LAG won’t get redirected to slave switch after shutdown the master switch, I got 4 paths, his PS4000 has two, so my iscsi connection survived due to there is an extra path to the slave switch, but somehow vmkping doesn’t work.

and if you look at comment #30 .

Jul 27, 2010
Dell acknowledged that the known issue they reporting in the manual of the EqualLogic Multipathing Extension Module is the same I get.

They didn’t open a ticket at vmware for now, but they will, after some more tests.

I think this issue is there since esx 4.0. In VI3 they used only one vmkernel for swiscsi with redundancy on layer1/2, so there it should not be the case.

My case number for this issue at vmware is 1544311161, the case number at dell is 818688246.

If vmware acknowledge this as a bug in 4.1, and don’t have a workaround, we will go with at least 4 logical paths for each volume and hope that at least one path is still connected after switch1 fails, until they fix it.
Finally, it could also be something related to EQL MEM Plugin for ESX which we have installed. (Comment #29 on page 2)

It indicates there is a know issue that once a network link failed (could be due to shut down the master switch), if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN.

Jul 23, 2010

from the dell eql MEM-User_Guide 4-1:

Known Issues and Limitations
The following are known issues for this release.

Failure On One Physical Network Port Can Prevent iSCSI Session Rebalancing
In some cases, a network failure on a single physical NIC can affect kernel traffic on other NICs. This occurs if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN. The result is that the iSCSI session management functionality in the plugin will fail to rebuild the iSCSI sessions to respond to failures of SAN changes.

Could it be the same problem I have? So they (DELL/VMware) already know about this problem?

Aside this it looks like the Dell MEM makes only sense in setups with more then one array per psgroup, because the PSP selects a path to a interface of the array where the data of the volume is stored. And it have a lot of limitations. We only have one array per group for now, so I think I skip this.

Still dont understand why there is no way to prevent that the connections go through the LAG in the first place, it should be possible to prefer direct connections…

My last reply to EQL Support today:

Some updates, may be you can pass them to L3 for further analysis.

The problem seemed to be due to EQL MEM version 1.0 Known Issue. (User Manual 4-1)

Failure On One Physical Network Port Can Prevent iSCSI Session Rebalancing

In some cases, a network failure on a single physical NIC can affect kernel traffic on other NICs. This occurs if the physical NIC with the network failure is the only uplink for the VMKernel port that is used as the default route for the subnet. This affects several types of kernel network traffic, including ICMP pings which the EqualLogic MEM uses to test for connectivity on the SAN. The result is that the iSCSI session management functionality in the plugin will fail to rebuild the iSCSI sessions to respond to failures of SAN changes.
I’ve performed the test again and it showed your prediction is correct that the path is always fixed to vmk2.

1. I’ve restarted the 2nd ESX host, and found the path C0 to C1 showed correctly.
2. The reboot master switch, from 1st or 2nd ESX Host, I cannot ping vCenter or vmkping EQL or iSCSI vmks on other ESX hosts, and CANNOT PING OTHER VMKernel such as VMotion of FT this time as well.
3. But VM did not crash or restart, so underneath, all iSCSI connections stay on-line, that’s good news.
4. After master swtich comes back, under storage path on ESX Hosts, I see those C4, C5 paths generated.

Could you confirm with VMware and EQL if they have this bug in their ESX 4.1 please? (ie, Path somehow is always fixed to vmk2)

I even did a test by switching off iSCSI ports on the master switch one by one, problem ONLY HAPPENS when I switch off ESX Host VMK2 port. (which is the first vmk iScsi port, ie, the default route for the subnet?)

It confirmed the vmkping IS BOUND to the 1st iSCSI vmk port which is vmk2 and this time, all my vmkping dead including VMotion as well as FT.

The good news is underneath the surface everything works perfectly as expected, iSCSI connections are working fine, VMs are still working and they can be VMotioned around, FT is also working fine.

We do hope VMware and EQL can release a patch sometime soon so they fix this vmkping problem that vmkping always has to go out of the default vmk2 which is the 1st iSCSI VMKernel in the vSwitch, but not any other vmk connections, so when the master switch dead, vmkping also died with vmk2 as vmkping uses vmk2 for ICMP ping to other VMKernel IP address.

Pages: Prev 1 2 3 ...256 257 258 259 260 261 262 ...292 293 294 Next