Category: Equallogic & VMWare (虛擬化技術)

Equallogic PS6100 New Feature: Vertical Port Failover

By admin, April 10, 2012 4:20 pm

Finally figured out what “Vertical Port Failover” is, that’s a cool feature to maintain the original full 4Gbps bandwidth! Is it really useful? To be honest, I don’t think so, well it’s fun to have though.

According to March 2012 Dell EqualLogic Configuration Guide v13.1 12:

In PS Series controllers prior to PS4100/6100 families, a link failure or a switch failure was not recognized as a failure mode by the controller. Thus a failure of a link or an entire switch would reduce bandwidth available from the array. Referring to Figure 5, assume that CM0 is the active controller. In vertical port failover, if CM0 senses a link drop on the local ETH0 port connection path, it will automatically begin using the ETH0 port on the backup controller (CM1). Vertical port failover is bi-directional. If CM1 is the active controller then vertical port failover will occur from CM1 ports to CM0 ports if necessary.

1

With PS4100/PS6100 family controllers, vertical port failover can ensure continuous full bandwidth is available from the array even if you have a link or switch failure. This is accomplished by combining corresponding physical ports in each controller (vertical pairs) into a single logical port from the point of view of the active controller. In a fully redundant SAN configuration, you must configure the connections as shown in Figure 20.

2

In a redundant switch SAN configuration, to optimize the system response in the event you have a vertical port failover you must split the vertical port pair connections between both SAN switches. The connection paths illustrated in Figure 6 and Figure 7 show how to alternate the port connection paths between the two controllers. Also note how IP addresses are assigned to vertical port pairs.

HIT/VE, ASM/VE and VMware Thin Provision Stun (Equallogic FW v5.2.2) on ESX 4.1

By admin, April 9, 2012 1:11 pm

In case you don’t know, VMware Thin Provision Stun is actually the 4th VAAI feature but it was hidden during its release time in the end of 2010. In case you also don’t know (this is not directly related though), I just learn that VMware’s HA technology is actually from Legato which EMC (VMware’s parent company) acquired back end.

Since then storage vendors start to integrate this great feature with their firmware upgrade following in 2011.

Equallogic incorporated this VMware Thin Provision Stun starting in firmware v5.1, but somehow I found it was hidden even as early as in v5.0.2, it worked out of box even with ESX 4.1.

I didn’t notice much difference in the volume properties under Group Manager after I upgraded PS6000XV to firmware v5.2.2. It was rather at a later time when I added a new volume using the HIT/VE (required FW v5.1 and above as well as MEM v1.1.0), I discovered there was an extra option (Enable VMware Thin Provision Stun) to create the volume with VMware Thin Provision Stun.

03

Double checking: Log to EQL Group Manager, then volume property has a new line now

TP warning mode: Leave online, generate initiator write error

01

You can also verify this by creating a testing volume again using EQL Group Manager, Thin provisioning modes options are selectable now in the new Equallogic firmware.

02

Oh…many may ask what does this thin provision stun do exactly? Well, basically when your Equallogic thin volume runs out of space (ie, maximum in-used space reaches 100%), instead of putting the whole volume offline and letting all VMs crashed on that volume, it will now ONLY suspend those VMs requiring more space continuously. On the other hand, for those VMs on the same volume don’t require more space for the time being, they can keep on working without any problem.

Region Capture

Back to the HIT/VE installation, the whole setup process is pretty simple, only catch is you need to create a new port group on iSCSI vSwtich in order for HIT/VE to access the Equallogic array. This may consider not secure by many system administrators.

One thing I really liked besides VMware Thin Provision Stun feature is ASM/VE. It OFFLOADS the backup process from ESX or vCenter Windows Server to Equallogic array itself, think of it as some kind of VAAI offload for backup. Veeam’s approach is to offload the backup to it’s backup proxy, but Equallogic internal backup (snapshots) is still way faster, well if you have the luxury space to spare that is of course.

The result (smart copy = snapshot) still stored on your Equallogic box, to many this is not wise, as to store the cold backup data in the expensive SAN doesn’t make sense (you can’t store it outside the volume which is being backed up, so if your volume is RAID10 or SSD, you have to store the snapshots in that volume as well, is it true any more? May be I’m wrong).

There is no way to store the snapshot off host as well. (may be there is, I saw one white paper showing how to use Symantec Backup Exec to offload the snapshots from EQL box with ASM/VE)

Oh, there is another great feature of ASM/VE: Smart Clone! It does have its real unique value! It’s extremely useful for application testing or testing a major patch to your VM.

Finally I really like HIT/VE GUI which is very simple and intuitive! To config an Equallogic new volume via HIT/VE is a piece of cake now, it will create everything for you automatically, no more manually configure iSCSI initiator access, VMFS rescan and attach volumes to ESX hosts, etc. A newly created VMFS will be ready within 5 clicks, that’s the beauty of Equallogic and guess what? It’s free of charge as usual!

Running Mac OS X Lion 10.7.3 on VMware Workstation v8.0.2

By admin, April 9, 2012 1:50 am

The installation is pretty easy once you found the correct link, also here. In fact, there is no installation involved as the VMFS has already been fully configured. All you do need to search for VMware Tools for Mac OS X, put it to USB stick and install from within the VM.

I gave the maximum configuration for Mac OS X: 2 vCPUs with 2 cores each, so total 4 vCPUs and 8GB ram, it runs so smooth almost like native, and lightning fast on SSD, it only took 5 seconds to boot into the following screen. The good thing is I can run it in full screen, watch HD movies, play games, download apps. I noticed CPU loading on my Optiplex 990 SFF i5-550 is almost 90% across all 4 cores when full loading the VM.

I haven’t touch Apple’s OS for almost 12 years, it’s exciting to see the familiar face again in a virtual world.

I really start to love VMware Workstation as the unlimited possibilities it can do, next targets will be nested hypervisors, esx5 cluster, view 5 and plugin my iPhone to this Mac VM and use iCloud to sync stuffs, Cool!

macosx

Thoughts about Dell Management Plug-In for VMware vCenter (DMPVV)

By admin, April 9, 2012 12:12 am

Honestly, I have repetitively deployed DMPVV mutiple times in order to get it right.

Region-Capture

1. You really need to make sure you have read Dell OpenManage Software Compatibility Matrix before installing any OM software because you need to upgrade the firmware for BIOS/iDrac/Lifecycle to their respective minimum requirement as stated in the guide.

2. DMPVC DOES NOT NEED to use a DHCP server

I even created a W2K8 R2 DHCP server, but found there is a place in menu for me to configurate the fix IP for the appliance.

3. DMPVC cannot start in vCenter saying some wired permission problem, things like Access Denied!

The problem is because I registered the vCenter using IP in DMPVV, but used host name when I login with vSphere client, after I changed hostname to IP, everything worked. Probably it’s my DNS not working properly, anyway, IP is fine for my case.

4. Connection Profile doesn’t work, then I found out you have to turn on Remote Enablement during the installation of OMSA on ESX.

The reason you see that error message about “OMSA is not installed” could be due to that when you installed the OMSA, you didn’t install it with -c option which installs the “Remote Enablement” component of OMSA. And our appliance talks to OMSA thought its remote enablement layer. Without successfully connect to OMSA, the iDrac connection will fail too as we correlate the correct iDrac IP with the server by getting the iDrac IP from OMSA first. Please reinstall the OMSA with –c option and that should solve your issue. Once you pass the connection test from the connection profile, please make sure to run inventory from the Job Queue by clicking the “Run Now”.

That indeed was the fix, and the -c is listed in the user guide for the command line, however there is no explanation in the user guide why it needs to be there, so I did not re-run the OMSA 6.4 installer. Perhaps the OMSA team could add the -c switch to the -x (for express) switch for OMSA, so that it is automatically included? Also according to the OMSA 6.4 install manual, if you run the -x switch it runs the express setup with all options included and ignores any other switches, apparently this is not true.

For OMSA 7.0

Run the following command to perform an express install with Remote Enablement parameters:
sh linux/supportscripts/srvadmin-install.sh -c -x

-c is for Remote Enablement
-x is for Express

Then start the applicable services by running the following command:
sh linux/supportscripts/srvadmin-services.sh start

5. License Disappeared

Sometimes after a reboot or DMPVV reset to factory default, license disappear, I have to re-deploy the whole thing again, well, the last successful re-install only took me 5 minutes as I have done it over 6 times. :)

6. CANNOT contact iDrac (SOLVED, but UNSOLVED for the time being)

My iDRAC subnet is on a separate switch, so COS Service Console obviously won’t work, this was by design as I want to physically separate all network segments, and it’s not routable. I knew in order for default DMPVC to work is to put COS and DRAC network on the same network segment which is NOT SECURE as far as I concern. Why doesn’t DMPVV give us an option to specify the subnet for iDrac and add another network adapter for this purpose?

During the research, I also found out by using Alt+F2, login as readonly with default admin password, then you can perform some network trouble shooting such as ping and tracepath.

Anyway, I still can’t figure out a way to route the traffic from DMPVC to iDrac via vCenter server without using a L3 router or firewall device. Is it possible to use route add on vCenter Windows server to redirect the DMPVV traffic to iDrac? If you know, please let me know.

So I was not able to test the firmware upgrade feature, but I am 100% sure it’s utilizing iDrac’s USC firmware updating feature to fetch firmware from ftp.dell.com and then perform the upgrade on the background, it’s the same as if you reboot the server and press F10 USC.

7. Service Temporarily Unavailable

DMPVC web server always crashes, probably due to I gave it 1GB (reduced from 3GB), after I changed to 2GB, it stopped crashing, but still loading the host page is extremely slow, around 2 minutes. Oh, DMPVV is a resource eater,taking up 2GB of ram fully and 100% CPU cycle when it’s connecting to the host.

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Apache/2.2.3 (CentOS) Server at xxx.xxx.xxx.xxx Port 443

8. Proxy to get firmware

This is the same issue as the above pt 6, see it’s going to bite back anyway, there is no direct connecting to Internet from within DMPVV, so you have to use a proxy server to download firmware, a better way would be adding a 3rd network adapter and connecting to your External port, hope this can be changed in the next release.

Finally, some good readings (total 3 parts) can be found on Virtual Life Style web site.

One of the coolest features I want to highlight is the PXE-less provisioning of the hypervisor to a physical server. This uses a combination of the Lifecycle Controller and iDRAC to deploy an installation ISO to the server. And since it is really tightly integrated with the VMware stack, the host is added to vCenter and configured using Host Profiles automatically, resulting in a true zero-touch deployment of a server. How cool!

In fact, there is a How to video regarding “Auto Discovery & Hypervisor Deployment – Dell Management Plug-In for VMware vCenter ”

One last word to add, I do think Dell Management Plug-In for VMware vCenter (DMPVV) is simply a proxy between ESX host and vCenter, still it should be made free as all the features of DMPVV can be achieved using different Dell server management products together. DMPVV is just a fancy toy that made all of them into a single product instead.

FYI, 12th generation servers such as R720/R620 doesn’t have to use OMSA as it it’s completely agent-less and no longer depends on OMSA agents within ESX hosts in order for DMPVV to work.

Dell Management Plug-In for VMware vCenter (v1.5) is FREE!

By admin, April 5, 2012 4:40 pm

Yes, it’s FINALLY Free (for 1 host only though) and the manual can be found here, and it’s available for download now (It’s a 2GB zip and I have to use Filezilla directly as the browser download always breaks and can’t resume), but that’s more than enough, simply manually switch host one after another as all I want is to update the firmware of Poweredge servers running ESX. :)

I really do hope Dell will make Dell Management Plug-In for VMware vCenter completely free of charge as it can be only used on Dell’s Poweredge server, but not IBM or HPs, so the users are essentially Dell’s customer anyway and it’s a management tool that everyone needs!

29gofolncqmhuvhgev_dmw77959

Dell Management Plug-In for VMware vCenter Feature List

Features Description
Inventory Detail Complete PowerEdge Server Details

  • Memory – Quantity and Type
  • NIC
  • PSU
  • Processors
  • RAC
  • Warranty Info
  • Server- and Cluster-level Views
BIOS and Firmware Update Deployment BIOS and Firmware

  • Baselines and Templates
  • Updates staged from VMware vCenter
Built-in Deployment Wizard Dell Servers Show up as a Bare Metal Server

  • Set configs of BIOS and Firmware updates

Profile Settings and Template

  • RAID
  • Server Name
  • IP Address

Hypervisor Templates for ESX and ESXi 4.1 and later release

Alert Control Level

  • Set Manual Approval Requirement
  • Allow the Tool to Automatically Remediate
Online Warranty Info Server Warranty Info via VMware vCenter

  • Service Provider
  • Warranty Type
  • Service Dates on Server or Cluster Level
Cluster Level Overview of Dell Servers

  • High Level Summary
  • Expanded View
  • Firmware
  • Warranty
  • Power
  • Reports
    • Sortable
    • Filterable
    • Exportable to CSV format

Veeam vs vRanger, The Battle is On, Round 1, KO!

By admin, April 5, 2012 4:15 pm

Round 1, KO!

I saw Anton from Veeam KOed vRanger last year in VMTN, although it’s not professional, but I like to see the pros and cons of both products and consumers have the rights to know the truth of negative side of each products.

Today I came across several articles posted by vRanger citing Veeam’s drawbacks and pitfalls and calling Veeam “a small company from Russia” (that’s a bit over and mean really). Well, I would say they are pretty true,  and I am not sure if those has been taking care  of in B&R v6 as I am still using v5. Anton if you saw this, please let me know if v6 has addressed the followings.

Btw, I know you act really quick in Veeam’s post around the web. I am really not sure how did you find all of them so quickly, is it by magic? :)

Critics about Veeam from vRanger can be also found here and there.

Don’t worry Anton, I have confident in Veeam and I am sure you don’t mind this can be discussed in public.

How a Poorly-Designed Architecture for Data Backup will Undermine a Virtual Environment — A Close Look at Veeam by Jason Mattox
Posted by kellyp on Jun 1, 2010 1:00:00 AM

At Vizioncore, we do not often cite our competitors by name in public. Our philosophy is that it is our job to provide expertise on virtual management requirements and the capabilities offered by Vizioncore for addressing those requirements.

However, members of our team also do have occassion to look in depth at competitive products. When the result is a fact-based assessment of how a competitor’s approach contrasts with Vizioncore, it seems to serve the larger community to put the information in public. The purpose of this is to help members of the community to better understand the real differences in approach and to better appreciate the value built into the Vizioncore product portfolio.

In this case, the competitor that we looked at is a small company called Veeam. Veeam offers an all-in-one product for backup, replication and recovery of VM images. They are privately held, based in Russia, and report having about 6K customers as of now. This compares to Vizioncore’s 20K+ customer level as announced in March 2010, with Vizioncore operating as a wholly-owned subsidiary of Quest Software. Quest is a public company, obligated to provide audited reporting of company financials.

The comparison and contrast between Veeam’s implementation of image-based backup and restore and vRanger Pro 4.5 appears below. This analysis is written by Jason Mattox, one of the co-inventors of the original vRanger Pro product. Jason continues to provide guidance and direction to new versions of Vizioncore technology products, including vRanger Pro 4.5.

We hope that you find Jason’s comments and insights educational. In a part 2 of this posting, we will offer more details on how vRanger Pro 4.5 and the Data Protection Platform (DPP) foundation on which it is built, contrasts with Veeam’s approach.

**************************************************

I have had the opportunity to take a deep, first-hand look at the Veeam 4.1.1 product over the last few weeks. My personal opinion – admittedly biased – is that they have a product built on a poor foundation. The problems with their architecture – and the potential result of the data protection not operating well and actually undermining an organization’s virtual environment – include the following:

Psuedo service-based architecture:  You install the product, it installs itself as a service, and you think, “okay good it’s a service based architecture.”  But it’s not:  Here is a simple test you can do on your own to prove the product is not a full service-based architecture.  Start a restore job.  Then log off Windows; the product will ask you: “Are you sure?” This is because if you log off Windows, it will cancel your running restores since it’s not running though the service. Another test you can try, is to attempt backup and restore at the same time; you cannot.  If the product was a true service based architecture, your backup and restore jobs would just be submitted to the service and the product wouldn’t care about the two functions running at the same time.

Lack of data integrity in the backup archive: Create a job in their product that contains, for example, 20 VMs. When you back up all the VMDKs, then Veeam puts all of the backup data into a single file.  Also, when you run incremental backups, they update this single large file directly.

When you have a single large file that needs to be updated the chance of corruption is high. Database manufacturers know this; products like SQL and Exchange write all their changes to log files first and then on a controlled event they post the changes to the single large file, the DB. Veeam does not implement this best practice, but rather updates a single 30, 40 or even 500 GB file directly instead of staging the data next to the file, then posting the data to the file once successful.

This is their Synthetic Full implementation – the entire basis for their product – and why we object to it so strenuously in terms of the risk that it introduces into customer environments.

Their argument in favor of Synthetic Full appears to have been that it enables backups to be faster. We believe that there are other, better methods available for speeding backup which do not risk the integrity of the backup repository. Methods including Active Block Mapping (ABM), now shipping in vRanger Pro 4.5. In beta test environments, our testers have reported that vRanger Pro backup is far faster than Veeam. However, your mileage will vary and we welcome more test reports from organizations testing both products.

Another argument in favor of Synthetic Full which has been offered by Veeam, is that it helps speed restore. Again, we agree with the goal but not with the method used to get there.

In vRanger Pro, we offer a Synthetic Restore process which has been in the product for some time. Our restore has been faster than Veeam’s for as long as we’ve been aware of Veeam. Our performance on restore was also improved in the 4.5 release, to be even faster than before.

Problems with updating a single file in the backup repository: Those of you familiar with database implementations – and the very good reasons for staging updates rather than writing them directly – will understand some of these problems immediately. This approach is especially problematic for image-based backup, and I’d like to offer some reasons as to why:

Tape space requirements – Because the original file is updated with every backup pass, the entire file must be written to tape every time. There is no method offered for moving just the new data to tape. This makes the ’sweep-to-tape’ process lengthy, and increases the number of tape cartridges required significantly. Tape management is, likewise, more difficult. The process of locating tapes, scanning to find data, and performing restore is, likewise, more difficult and lengthy.

Problems working with Data Domain de-duplication storage and similar storage appliances – Because the original file is amended with every backup pass, the appliance cannot be efficient in de-duplicating and replicating the backup data.

Finding and restoring individual VMs from the backup job – Because the backup file includes more than one VM, it is not named intuitively to enable easy browse and restore of the VMs required by the admin.

Overhead in the process of creating and managing simultaneous backup and recovery jobs – it’s just harder to do:  In their product, if you create a single backup job of let’s say 30 VMs they will backup one VM at a time. To perform more backup jobs at one time, you must create more jobs.  For each job, you must step through the entire backup wizard, which is time-consuming. The same holds true for restore jobs: for each VM, you must step through the entire restore wizard to create and submit a job to restore a VM. This isn’t that bad most of the time, but for disaster recovery scenarios or in situations in which entire ESX servers must be rebuilt, this simply isn’t that practical.

Feature called de-dupe is really something else: De-dupe in their product is not true de-dupe, but is perhaps better described as a template-based backup. Here’s what they do: they define a base VM – this being a set of typical files or blocks typically found in a VMDK – and they use this as the comparison for their full backup of the VM. For example, if you have two Windows guests then they do not have to backup up the Windows configuration because it is already in the base VM template.

However, there are some important limitations of their approach which include:

Their de-dupe is only good within the backup job. The more jobs you create, the less beneficial the de-dupe is because blocks are duplicated between and among backup jobs. If you need to create more backup jobs to gain better backup performance across multiple VMs, then the de-duplication benefit goes down.

Their de-dupe is defined with a base VM – and does not change with the configuration of the guests. If you have two Exchange servers being protected in the same job, then all of the blocks for the Exchange configuration will be included twice – even if they are identical.

Our own implementation of de-duplication is pending delivery later this year. We have developed true global, inline deduplication designed to offer maximum de-duplication benefits. It’s in test now. Our architecture, which includes keeping backup files intact and untouched once written into the repository, has been a key in enabling our de-duplication to function with true de-dupe capabilities.

Lack of platform scalability:  To scale out their product in virtual application mode, LAN free mode, ESXi network or ESXi replication, they have to install their product many times. To make it possible to manage all of the deployments, they offer an API layer and provide a ASP.net web page so that customers can go to check job status for their many installs.  This console does not allow you do create or edit jobs, but is a monitor. They call this their Enterprise console.

ESXi replication is network-exhaustive:   In their implementation for ESXi, their product reads from the vStorage API over the network uncompressed to their console, then it writes the resulting data over to the other target ESXi host over the vStorage API uncompressed.

What’s wrong with this? In the first place, the vStorage API was not designed for the WAN link; it was designed for backups which were meant for the LAN. The other issue is that the traffic is uncompressed; WAN links are not cheap so compression is a key feature that’s needed. Also, if you look at the resources needed for this, just a single replication job can consume 50-80% CPU of a 2 CPU VM. So if you think about how you would scale this out from a bandwidth and installation point of view, this doesn’t seem practical.

Use of unpublished VMware API calls:  If you have ever used the Datastore browser from the vCenter client, this process uses an internal API that’s not exposed to 3rd parties called the NFC. Here is what they have done: they are impersonating the vCenter client and using the internal NFC API to work with VMs.

So, here’s the risk:  VMware may trace a reported problem with a VM back to a 3rd party product that is using an unpublished vCenter API by impersonating the vCenter client. Will VMware be okay with this? Might VMware get a little strange with you and their ability to support you and your environment?

If you want to verify this for yourself just look in the logs of their product for ”NFC”, look at the target datastore for files that are not VMware related. Ask them how do they transfer and modified files in the datastore that are not your normal VMware files?

Why the stakes are high in virtual environments: Virtual environments are some of the fastest-growing and most dynamic environments in the world. As virtual servers continue to gain momentum in terms of their adoption rate, administrators are presented with the big challenge of keeping ever-expanding virtual resources monitored and under efficient management. At Vizioncore, we want to enable this momentum to continue by offering data protection and management capabilities which are purpose-built for images, with foundational capabilities designed to ensure that protection methods are — and remain — affordable, resource-efficient, and easy to use and operate. No matter how large your virtual environment grows.

Update April 7, 2012

Got a reply from Veeam’s support, their suggest is to use Full Backup instead of Synthetic Backup method, kind of avoid answer my question directly, well, this part does suck anyway. After all. nobody want to copy that hundred GB of vbk everyday after the backup session over a 100Mbps or eve 1Gbps link. Another problem is Full Backup tends to keep a lot more retention copies than Synthetic, this uses more space as well.

During the first run of a forward incremental backup (or simply incremental backup), Veeam Backup & Replication creates a full backup file (.vbk). At subsequent backups, it only gets changes that have taken place since the last performed backup (whether full or incremental) and saves them as incremental backup files (.vib) next to the full backup.

Incremental backup is the best choice if company regulation and policies require you to regularly move a created backup file to tape or a remote site. With incremental backup, you move only incremental changes, not the full backup file, which takes less time and requires less tape. You can initiate writing backups to tape or a remote site in Veeam Backup & Replication itself, by configuring post-backup activities.

Strange High Latency (Read) After Equallogic Firmware Upgrade (Solved!)

By admin, April 3, 2012 2:30 pm

I have performed the firmware upgrade today on one of the PS6000XVs to latest v5.2.2.

Everything worked as it should be, VMs stayed as solid as steel (no ping lost), the contoller failover took about 10 seconds (ie, ping to group ip had a 10 seconds black out) and the whole upgrade took about 10 minutes to complete as expected.

Caution: Controller failed over in member myeql-eql01

Caution: Firmware upgrade on member myeql-eql01 Secondary controller.
Controller Secondary in member myeql-eql01 was upgraded from firmware version Storage Array Firmware V5.0.2 (R138185)

Caution: Firmware upgrade on member myeql-eql01 Primary controller.
Controller Primary in member myeql-eql01 was upgraded from firmware version to version V5.2.2 (R229536)

However there are various problem started to occur after the upgrade, mainly high TCP Retransmit, high disk latency (read) and a fan of active controller module failed. Besides, EQL battery temperature also went up by 5 degrees comparing to its original state. (something is going on on background contributes to this raise for sure)

1. High TCP Retransmit (SOLVED)

The IOMeter benchmark dropped by almost 90% and high TCP Retransmit starts to occur, re-installed MEM on ESX Hosts, reboot, still the same.

Then I reboot the PowerConnect 5448 switches one by one, this solved the problem completely, but why Equallogic Firmware Upgrade requires the switch gears to be rebooted? Was something cached in the switch, ARP, MAC? I don’t know really, may be this is the time we say “Wow, It Worked~ It’s Magic!”

2. High Disk Read Latency (Remain UNSOLVED)

This PS6000XV used to have below 6ms Latency, it’s now 25-30ms on average, and the funny thing is whenever the IOPS is extreme high in 9,000 range (I use IOMeter to push my array to its max), the latency becomes really low in 5ms range.

Vice versa, whenever the OPS is extreme low in 5 to 70 where I stopped the IOMeter, the latency jumps sky high in 130-120ms range.

All these were performed using the latest SAN HQ v2.2 Live View tool, I really liked it much!

All individual volumes latency added together is still 5-6ms, so where does the extra 20 something HIDDEN latency coming from?

Contacted US EQL support as local support has no clue what so ever as usual, he told me it could be due to meta data remapping process going on background after the firmware upgrade, and I need to wait a few to 24 hours for it back to normal. To be honest, I’ve near heard such thing nor I can googled about this (ie, disk meta data needs to be remapped after firmware upgrade)

Ok, things are still the same after almost 48 hours, so I doult this is the problem, ps aux shows no process is going on at the array.

Remember my controller temperature also went up by almost 25% indicating somthing is working heavily on the storage processor, so could this be an additional good indicator show that my PS6000XV is still doing some kind of background meta data integrity checking whenever it senses IOPS is low, so it boost the meta data integrity checking process, so we see that high latency?

Anyway, this problem remains as mystery, I don’t have any performance issue and this can be ignored for the time being, and I think only time can tell the truth when the background disk I/O thing completes its job and latency hopefully will back to normal.

In fact, I hate to say this, but I strongly suspect it’s a bug in Equallogic firmware v5.2.2, if you do have any idea, please drop me a line, thanks.

3. Fan of Active Controller Module Failed (SOLVED)

When active controller failover, Module 0 Fan1 went bad, it turns out to be FALSE ALARM, the fan returns to normal after the 2nd manual failover again.

Oh…the ONLY Good thing out of the whole firmware upgrade is TCP Retransmit is now 0% 99.9999% of the time and I do sense IOPS is 10% higher than before as well.

I saw a single spike of 0.13% only once in the past 24 hours, um… IT’S JUST TOO GOOD TO BE TRUE, SOUNDS TOO SUSPICIOUS to me as the TCP Retransmit used to be in 0.2% range all the time.

Update May 1, 2012

The mystery has been SOLVED finally after almost 1 complete month!

I disabled Delayed Ack on all my ESX hosts for the cluster, after reboot the hosts one by one, I witness High Latency issue has gone forever! It’s back to 3.5-5.0ms normal range. (after 12:30pm)

fixed

The high read latency problem was indeed due to Delayed Ack which was enabled on ESX 4.1 (by default). As it was also stated by Don Williams (EQL’s VMware specialist), Delayed Ack adds artificial (or Fake) high (or Extra) latency to your Equallogic SAN, that’s why we saw those false positive on SANHQ.

In other words, SANHQ was deceived by the fake latency number induced by Delayed Ack, which leads to the strangeness of this problem.

It’s nothing to do with our switch setting, but still this doesn’t explain why EQL firmware v5.0.2 or before doesn’t have this problem, so it might still related to a firmware bug in v5.1.x or v5.2.x that triggered the high latency issues in those ESX/ESXi hosts with Delayed Ack enabled (by default).

Finally, IOmeter IOPS shows 10-20% increase in performance after upgrading to firmware v5.2.2. (actually with or without disabling the Delayed Ack)

Again, I am greatly appreciated for the help from Equallogic’s technical support team, we have done many direct WebEX sessions with them and they have been always patient and knowledgeable, know their stuff, especially Ben who’s the EQL performance engineer, he also gave me an insight lecture in using SANHQ, I’ve learnt many useful little stuff in helping me troubleshooting my environment in the future!

and of course, Joe, who’s responsible for EQL social media, it proved Equallogic is a great company who cares their customer no matter large or small, this really makes me feel warm. :)

Equallogic Arrays is NetBSD based?

By admin, April 3, 2012 1:24 pm

I had  a WebEX session with US Equallogic Support yesterday, I saw the followings when he had direct access to EQL PS6000XV console.

cli-child-3.2# uname -a
NetBSD 1.6.2 NetBSD 1.6.2 (EQL.PSS)

Also during the booting process of PS6000XV, the active controller module shows 4 cores

All slave cpus (16) ack’ed userapp init
count = 4, total =4
All slave cpus (4) ack’ed message ring init

For a list of undocumented Equallogic CLI commands, see this link.

I was also told the CPU of PS4000 series is dual-cores, that’s why it’s slower than PS6000 series in terms of performance.

Upgrade Equallogic MEM to v1.1.0

By admin, March 31, 2012 5:10 pm

The reason I choosed to upgrade MEM first is becuase in the release note of MEM 1.1.0, it is said the minimum EQL FW should be v4.3.7 (which I’m on v5.0.2), but FW v5.2.2 states clearly that it only supports MEM 1.1.0, so I need to upgrade MEM first before I upgrade my EQL firmware to v5.2.2 and add HIT/VE v3.1.1 later.

1. Remember to put your ESX host to Maintenance Mode FIRST!!!

2. Use the MEM installation script as I found this is the most simple method, open vSphere CLI

C:\>setup.pl –install –server=xxx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Upgrading from existing Dell EqualLogic Multipathing Extension Module installed:
DELL-eql-mem-1.0.0.130413
Defaulting to offline bundle ‘C:\dell-eql-mem-esx4-1.1.0.222691.zip’
The install operation may take several minutes.  Please do not interrupt it.
Upgrade install was successful.
You must reboot before the new version of the Dell EqualLogic Multipathing Exten
sion Module is active.

3. Remember to Reboot it as well if your host is ESX 4.1

4. Check if the MEM is installed correctly, found out MEM setting has been set to default, so need to change it again to its maximum power to ultilize all 4 paths. (ie, volumesessions=12, membersessions=4)

C:\>setup.pl –query –server=xxx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Found Dell EqualLogic Multipathing Extension Module installed: DELL-eql-mem-esx4
-1.1.0.222691
Default PSP for EqualLogic devices is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b90b5b1d654e6e700d0fc is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078c06ba23424c914a0f1889d68 is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03bd0aad3cc34ade700500f is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b80201dcbe4a5e70010ee is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b40e895cc74abe70000fd is DELL_PSP_EQL_ROUTED.
Found the following VMkernel ports bound for use by iSCSI multipathing: vmk2 vmk
3 vmk4 vmk5

C:\>setup.pl –listparam –server=xx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Parameter Name  Value Max   Min   Description
————–  —– —   —   ———–
TotalSessions   512   1024  64    Max number of sessions per host.
VolumeSessions  6     12    1     Max number of sessions per volume.
MemberSessions  2     4     1     Max number of sessions per member per volume.
MinAdapterSpeed 1000  10000 10    Minimum adapter speed for iSCSI multipathing.

C:\>setup.pl –setparam –name=VolumeSessions –value=12 –server=xxx.xxx.xxx.xxx
C:\>setup.pl –setparam –name=MemberSessions –value=4 –server=xxx.xxx.xxx.xxx

* Make sure you have 4 NICs for iSCSI before setting MemberSessions to 4 or you will see high TCP Retransmit.

5. Double check the storage path in vCenter to make sure MEM is the default PSP and all 4 paths are Active.

OpenManage 6.5 Update on ESX 4.1, Poor Equallogic Local Support!

By admin, March 31, 2012 1:56 pm

Today I’ve finally got time to update some of the major components such as OSMA, EQL MEM, EQL FW, Dell Bios, etc.

The first thing I have done is to upgrade OpenManage to latest version 6.5 on ESX 4.1 hosts.

The whole thing is actually very easy, simply download OM-SrvAdmin-Dell-Web-LX-6.5.0-2247_A01.15.tar (762MB). Yeah, I know Dell site is confusing, but this is the right one to download, then install the package according to the Short Installation Guide on the download page.

The setup script will actually perform an automatic upgrade process. Oh…there are some warnings, saying something is not found such as symbolic links, etc. But don’t worry, just remember to start the OMSA (wsman) service again manually, not even need to reboot the host or put it into maintenance mode, everything was performed live without any problem.

However, there are two minor warnings:
1.”Unsupported browser version. See the User’s Guide for minimum supported browser versions.”
Ok. I am using the latest Firefox v10, it’s not compatible, may be OpenManage 6.5 A2 or OpenManage 7.0 will have this included, I found both versions are available in Document section, but nothing  found in the download page, strange.
2. Dell IT Assistant starts to show Yellow Exclamation Mark and OpenManage starting to complain my H700 firmware is old and not updated, that’s ok, I will do it later.

Next, it’s time for me to upgrade MEM to v1.1.0 first and then EQL FW to v5.2.2, well I’ve waited almost a full year since my last update of EQL components.

Strange enough local Dell EQL technical support told me to shut down all VM before performing the EQL firmware update, luckily I didn’t listen to their advice as I vaguely remember that EQL FW  update can be done in real time without shutting down the host or VM.

Later I confirmed this with Danny, Thanks again! This is where Equallogic local support (HK and China region) starts to worry me a lot recently as Open Case is no longer handled by US EQL staff since the mid of last year, the technical knowledge and quality is just not there at all, worst, at times, they tend to ask their customers to do things ridiculously or suggest their customer to try it (without performed the step themselves). I’ve told them what if I have thousands of VMs and do I need to shut down all of them just to perform the firmware upgrade? The answer is Yes! HUH???? BS!!!

Yes I am complaining about local Equallogic support service is getting poorer and please correct this issue ASAP!

Update: April 6, 2012

OM-SrvAdmin-Dell-Web-LX-7.0.0-4614_A00.tar.gz is out today!

Pages: Prev 1 2 3 4 5 6 7 8 9 10 ...16 17 18 Next