Strange High Latency (Read) After Equallogic Firmware Upgrade (Solved!)

By admin, April 3, 2012 2:30 pm

I have performed the firmware upgrade today on one of the PS6000XVs to latest v5.2.2.

Everything worked as it should be, VMs stayed as solid as steel (no ping lost), the contoller failover took about 10 seconds (ie, ping to group ip had a 10 seconds black out) and the whole upgrade took about 10 minutes to complete as expected.

Caution: Controller failed over in member myeql-eql01

Caution: Firmware upgrade on member myeql-eql01 Secondary controller.
Controller Secondary in member myeql-eql01 was upgraded from firmware version Storage Array Firmware V5.0.2 (R138185)

Caution: Firmware upgrade on member myeql-eql01 Primary controller.
Controller Primary in member myeql-eql01 was upgraded from firmware version to version V5.2.2 (R229536)

However there are various problem started to occur after the upgrade, mainly high TCP Retransmit, high disk latency (read) and a fan of active controller module failed. Besides, EQL battery temperature also went up by 5 degrees comparing to its original state. (something is going on on background contributes to this raise for sure)

1. High TCP Retransmit (SOLVED)

The IOMeter benchmark dropped by almost 90% and high TCP Retransmit starts to occur, re-installed MEM on ESX Hosts, reboot, still the same.

Then I reboot the PowerConnect 5448 switches one by one, this solved the problem completely, but why Equallogic Firmware Upgrade requires the switch gears to be rebooted? Was something cached in the switch, ARP, MAC? I don’t know really, may be this is the time we say “Wow, It Worked~ It’s Magic!”

2. High Disk Read Latency (Remain UNSOLVED)

This PS6000XV used to have below 6ms Latency, it’s now 25-30ms on average, and the funny thing is whenever the IOPS is extreme high in 9,000 range (I use IOMeter to push my array to its max), the latency becomes really low in 5ms range.

Vice versa, whenever the OPS is extreme low in 5 to 70 where I stopped the IOMeter, the latency jumps sky high in 130-120ms range.

All these were performed using the latest SAN HQ v2.2 Live View tool, I really liked it much!

All individual volumes latency added together is still 5-6ms, so where does the extra 20 something HIDDEN latency coming from?

Contacted US EQL support as local support has no clue what so ever as usual, he told me it could be due to meta data remapping process going on background after the firmware upgrade, and I need to wait a few to 24 hours for it back to normal. To be honest, I’ve near heard such thing nor I can googled about this (ie, disk meta data needs to be remapped after firmware upgrade)

Ok, things are still the same after almost 48 hours, so I doult this is the problem, ps aux shows no process is going on at the array.

Remember my controller temperature also went up by almost 25% indicating somthing is working heavily on the storage processor, so could this be an additional good indicator show that my PS6000XV is still doing some kind of background meta data integrity checking whenever it senses IOPS is low, so it boost the meta data integrity checking process, so we see that high latency?

Anyway, this problem remains as mystery, I don’t have any performance issue and this can be ignored for the time being, and I think only time can tell the truth when the background disk I/O thing completes its job and latency hopefully will back to normal.

In fact, I hate to say this, but I strongly suspect it’s a bug in Equallogic firmware v5.2.2, if you do have any idea, please drop me a line, thanks.

3. Fan of Active Controller Module Failed (SOLVED)

When active controller failover, Module 0 Fan1 went bad, it turns out to be FALSE ALARM, the fan returns to normal after the 2nd manual failover again.

Oh…the ONLY Good thing out of the whole firmware upgrade is TCP Retransmit is now 0% 99.9999% of the time and I do sense IOPS is 10% higher than before as well.

I saw a single spike of 0.13% only once in the past 24 hours, um… IT’S JUST TOO GOOD TO BE TRUE, SOUNDS TOO SUSPICIOUS to me as the TCP Retransmit used to be in 0.2% range all the time.

Update May 1, 2012

The mystery has been SOLVED finally after almost 1 complete month!

I disabled Delayed Ack on all my ESX hosts for the cluster, after reboot the hosts one by one, I witness High Latency issue has gone forever! It’s back to 3.5-5.0ms normal range. (after 12:30pm)

fixed

The high read latency problem was indeed due to Delayed Ack which was enabled on ESX 4.1 (by default). As it was also stated by Don Williams (EQL’s VMware specialist), Delayed Ack adds artificial (or Fake) high (or Extra) latency to your Equallogic SAN, that’s why we saw those false positive on SANHQ.

In other words, SANHQ was deceived by the fake latency number induced by Delayed Ack, which leads to the strangeness of this problem.

It’s nothing to do with our switch setting, but still this doesn’t explain why EQL firmware v5.0.2 or before doesn’t have this problem, so it might still related to a firmware bug in v5.1.x or v5.2.x that triggered the high latency issues in those ESX/ESXi hosts with Delayed Ack enabled (by default).

Finally, IOmeter IOPS shows 10-20% increase in performance after upgrading to firmware v5.2.2. (actually with or without disabling the Delayed Ack)

Again, I am greatly appreciated for the help from Equallogic’s technical support team, we have done many direct WebEX sessions with them and they have been always patient and knowledgeable, know their stuff, especially Ben who’s the EQL performance engineer, he also gave me an insight lecture in using SANHQ, I’ve learnt many useful little stuff in helping me troubleshooting my environment in the future!

and of course, Joe, who’s responsible for EQL social media, it proved Equallogic is a great company who cares their customer no matter large or small, this really makes me feel warm. :)

Equallogic Arrays is NetBSD based?

By admin, April 3, 2012 1:24 pm

I had  a WebEX session with US Equallogic Support yesterday, I saw the followings when he had direct access to EQL PS6000XV console.

cli-child-3.2# uname -a
NetBSD 1.6.2 NetBSD 1.6.2 (EQL.PSS)

Also during the booting process of PS6000XV, the active controller module shows 4 cores

All slave cpus (16) ack’ed userapp init
count = 4, total =4
All slave cpus (4) ack’ed message ring init

For a list of undocumented Equallogic CLI commands, see this link.

I was also told the CPU of PS4000 series is dual-cores, that’s why it’s slower than PS6000 series in terms of performance.

PCI-e 3.0 is here, 800MB Per Lane!

By admin, April 2, 2012 9:45 pm

I noticed the latest Poweredge R720 is already available on Dell’s page, but after study the specs a bit, I think I will skip the 12th generation for the time being.

Mainly because Xeon E5 is very similar to Xeon 5600, it’s still 32nm, it only has 2 more cores (ie, 8 cores) and a bit more cache. (ie, 2.5MB/core vs 2MB/core) and the performance is not impressive at all, the extra 2 cores only contributes to the additional 40% in VMMark benchmark. Not to mention CPU is never a bottleneck after all, I/O and Memory are!

Yes, it has 24 dimms vs 16 dimms R710, so that’s 64GB more with 8GB DIMM being the main stream now, but ESX 5.0 Enterprise Plus has a 96GB per socket glass ceiling, so no point here to upgrade again.

The only two things can really sell are:

1. A lot more disks can be fitted into the 2U now than before, 12 x 3.5″ or 16 2.5″ drives.

2. PCI-e 3.0 is finally here! 800MB Per Lane with x4 that is 3.2GB/s. The new PERC H810 x8 with 1GB cache is designed exactly for this kind of hugh I/O, finally we can use 2 or more SSD to break that 3GB/s limit!

72844_3_5_l

Upgrade Equallogic MEM to v1.1.0

By admin, March 31, 2012 5:10 pm

The reason I choosed to upgrade MEM first is becuase in the release note of MEM 1.1.0, it is said the minimum EQL FW should be v4.3.7 (which I’m on v5.0.2), but FW v5.2.2 states clearly that it only supports MEM 1.1.0, so I need to upgrade MEM first before I upgrade my EQL firmware to v5.2.2 and add HIT/VE v3.1.1 later.

1. Remember to put your ESX host to Maintenance Mode FIRST!!!

2. Use the MEM installation script as I found this is the most simple method, open vSphere CLI

C:\>setup.pl –install –server=xxx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Upgrading from existing Dell EqualLogic Multipathing Extension Module installed:
DELL-eql-mem-1.0.0.130413
Defaulting to offline bundle ‘C:\dell-eql-mem-esx4-1.1.0.222691.zip’
The install operation may take several minutes.  Please do not interrupt it.
Upgrade install was successful.
You must reboot before the new version of the Dell EqualLogic Multipathing Exten
sion Module is active.

3. Remember to Reboot it as well if your host is ESX 4.1

4. Check if the MEM is installed correctly, found out MEM setting has been set to default, so need to change it again to its maximum power to ultilize all 4 paths. (ie, volumesessions=12, membersessions=4)

C:\>setup.pl –query –server=xxx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Found Dell EqualLogic Multipathing Extension Module installed: DELL-eql-mem-esx4
-1.1.0.222691
Default PSP for EqualLogic devices is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b90b5b1d654e6e700d0fc is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078c06ba23424c914a0f1889d68 is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03bd0aad3cc34ade700500f is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b80201dcbe4a5e70010ee is DELL_PSP_EQL_ROUTED.
Active PSP for naa.6090a078e03b40e895cc74abe70000fd is DELL_PSP_EQL_ROUTED.
Found the following VMkernel ports bound for use by iSCSI multipathing: vmk2 vmk
3 vmk4 vmk5

C:\>setup.pl –listparam –server=xx.xxx.xxx.xxx
You must provide the username and password for the server.
Enter username: root
Enter password:

Parameter Name  Value Max   Min   Description
————–  —– —   —   ———–
TotalSessions   512   1024  64    Max number of sessions per host.
VolumeSessions  6     12    1     Max number of sessions per volume.
MemberSessions  2     4     1     Max number of sessions per member per volume.
MinAdapterSpeed 1000  10000 10    Minimum adapter speed for iSCSI multipathing.

C:\>setup.pl –setparam –name=VolumeSessions –value=12 –server=xxx.xxx.xxx.xxx
C:\>setup.pl –setparam –name=MemberSessions –value=4 –server=xxx.xxx.xxx.xxx

* Make sure you have 4 NICs for iSCSI before setting MemberSessions to 4 or you will see high TCP Retransmit.

5. Double check the storage path in vCenter to make sure MEM is the default PSP and all 4 paths are Active.

OpenManage 6.5 Update on ESX 4.1, Poor Equallogic Local Support!

By admin, March 31, 2012 1:56 pm

Today I’ve finally got time to update some of the major components such as OSMA, EQL MEM, EQL FW, Dell Bios, etc.

The first thing I have done is to upgrade OpenManage to latest version 6.5 on ESX 4.1 hosts.

The whole thing is actually very easy, simply download OM-SrvAdmin-Dell-Web-LX-6.5.0-2247_A01.15.tar (762MB). Yeah, I know Dell site is confusing, but this is the right one to download, then install the package according to the Short Installation Guide on the download page.

The setup script will actually perform an automatic upgrade process. Oh…there are some warnings, saying something is not found such as symbolic links, etc. But don’t worry, just remember to start the OMSA (wsman) service again manually, not even need to reboot the host or put it into maintenance mode, everything was performed live without any problem.

However, there are two minor warnings:
1.”Unsupported browser version. See the User’s Guide for minimum supported browser versions.”
Ok. I am using the latest Firefox v10, it’s not compatible, may be OpenManage 6.5 A2 or OpenManage 7.0 will have this included, I found both versions are available in Document section, but nothing  found in the download page, strange.
2. Dell IT Assistant starts to show Yellow Exclamation Mark and OpenManage starting to complain my H700 firmware is old and not updated, that’s ok, I will do it later.

Next, it’s time for me to upgrade MEM to v1.1.0 first and then EQL FW to v5.2.2, well I’ve waited almost a full year since my last update of EQL components.

Strange enough local Dell EQL technical support told me to shut down all VM before performing the EQL firmware update, luckily I didn’t listen to their advice as I vaguely remember that EQL FW  update can be done in real time without shutting down the host or VM.

Later I confirmed this with Danny, Thanks again! This is where Equallogic local support (HK and China region) starts to worry me a lot recently as Open Case is no longer handled by US EQL staff since the mid of last year, the technical knowledge and quality is just not there at all, worst, at times, they tend to ask their customers to do things ridiculously or suggest their customer to try it (without performed the step themselves). I’ve told them what if I have thousands of VMs and do I need to shut down all of them just to perform the firmware upgrade? The answer is Yes! HUH???? BS!!!

Yes I am complaining about local Equallogic support service is getting poorer and please correct this issue ASAP!

Update: April 6, 2012

OM-SrvAdmin-Dell-Web-LX-7.0.0-4614_A00.tar.gz is out today!

黑色星期六

By admin, March 24, 2012 5:36 pm

1比18車模收藏一直都有一個不成文的規矩,那就是黑色的超跑通常會在一段日子後身價倍升,可能是因為生產數量比較少吧。出名的例子多不勝數,如UT Ferrari 355 GTB/GTC, Autoart Lamborghini Countach, Diablo VT/Roadster/GTR, Kyosho Ferrari 328  GTB, Minichamps Bentley Continental GT等,這些”稀有動物”通常市場價格可以是原來的2-3倍,當然,有價無市的情況也普遍的很。

今天是不用返工的星期六,本來想睡到日上三杆,但二杆時就給窗外的一陣低吼引擎聲嘈醒了我的美夢。NND! 誰這麼沒公德心,停車還不熄匙。但潛意識馬上告訴我這應該是台”駿馬”。身體不由自主地以2秒的時間從床上彈起,再用3秒的極速從抽屜里拿出相機然後跑到窗前,果然沒猜錯,是台罕見的黑色F430,連對焦到拍攝整個過程用了大概10秒完成,之後才定過神來,原來我已經起床了,哈哈。。。

下午回來的時候,發現樓下又有3台黑色的車並排而列,其中的那台Audi TT還真可愛。

今天這麼多黑色的車,原來是Black Saturday!

IMG_6491

IMG_6492

Dell Acquires SonicWall: Oh NO!

By admin, March 16, 2012 2:11 pm

Sometimes it really makes you wonder why big corporation collects rubbishes just to expand their portfolio.

Well I am absolutely speechless when I saw the title today, so it’s time to sell your NASDAQ:DELL. :)

其實快樂就是這麼很簡單

By admin, March 14, 2012 8:57 pm

昨天的球聚邀請來了兩位好久沒見的老朋友,我們一行4人打了場心情放鬆兼好玩的混雙比賽。

大前天剛剛過了第3年的快樂網球生涯,這兩位老朋友的重臨又令我回想起了這些年因緣份而結識了很多球場朋友們的經過。

如果到了七八十歲還能夠與這班陪伴著我走過人生的網球朋友們繼續在球場上馳騁,那將會是多好的事情啊。

很感謝他們一直以來的鼓勵和無私的分享,令當年的我重拾起對網球的熱誠,雖然進步很慢,但還是有目共睹的。(厚著面皮的說,哈哈。。。)

今天的重聚令我再次感受到了快樂原來可以很簡單,幾個志趣相投的朋友聚首一堂,天南地北地暢談一番,打一場球,出一身汗,整個人舒服晒!

其實快樂就是這麼很簡單,你以為有幾困難呢﹖哈哈。。。

What a Coincidence 網絡世界的緣份

By admin, March 13, 2012 10:02 pm

Yesterday I bought a 2nd hand Fortigate 60B from a famous computer user group in Hong Kong.

During the trade, I found out the seller Danny also uses Equallogic in his company, umm…the way he talked reminded me a lot someone I must have come across before, but I couldn’t figure out who he exactly was at the time.

Later after I came back and opened my blog. Danny…isn’t there also a Danny always comments on my blog before regarding EQL products? Double checked the email on my blog and the one seller provided, Bingo, he’s the same person, what a miracle! Virtual world finally touched reality without knowing it’s actually happening during the trade, so wired and funny!

This reminds me Disney’s most well-known song “It’s a Small Small World”, I think one of these days we (one who is interested in EQL products) should form an informal Hong Kong Equallogic User Group gathering in a coffee shop or something, anyone got any solid idea, please contribute and make it happen.

Danny, thank you very much again for almost giving out this firewall for me to play with. Personally, I really like to play with this kind of toy (network appliances) during my spare time, because I like their intuitive UI and innovative features and find they are fascinating to play with and all these toys makes you to become the god of your own domain is of course exciting.  Btw, I also got some other fun toys on hand like Netscreen, Top Layer, Radware, BlueCoat, Riverbed etc , together with this FTG-60B, they will keep me very busy for a while.

IMG_6488

為什麼和尚要買CHANEL手袋?

By admin, March 13, 2012 11:59 am

朋友在香港拍到的,和尚買相机/iPhone/iPad我明白,但這CHANEL手袋?

我自問大惑不解,有合理解釋的嗎?

422780_10150744441568538_608603537_9093206_701712574_n

Pages: Prev 1 2 3 4 5 6 7 ...260 261 262 ...331 332 333 Next