Category: Equallogic & VMWare (虛擬化技術)

關於近兩年Equallogic Firmware的命名方式

By admin, December 24, 2013 12:24 pm

eqlf說真的,我還在用v5.2.2,v6頂多可以叫做5.4,v7那也就是5.7。

EQL最近一兩年這樣的命名方式簡直就是自欺欺人,用來嚇唬人新人還可以,其實背后根本還是一頭紙老虎來的。所以最近一年v6和v7的更新在EQL社區里都沒有像之前5.1引來那么大的回響了,而那些所謂的新功能其實也都比較雞肋。

咨同樣道理ESX4.1>ESX5.5 (其實應該叫ESX4.8)沒之前那么大的回響也是預料之中。

If it ain’t broke don’t fix it

另外據一位資深中國Dell L2工程師多年前的忠告,如果沒事就請別搬石頭砸自己的腳,請不要動不動就去升級各類硬件的Firmware和Driver,一定要以穩定至上。

e.g., http://communities.vmware.com/message/2300998

Last but not least, keep your network & storage design simple but elegant!

最後其實我一直都有以下疑問,多年來都沒有得到很好的答案:

個別Member出錯的機會應該會隨着你增加EQL Members進Group而增加的,那豈不很危險﹖想想看,一個16個Members的Group個別硬件出錯機會一定比一個4個Members的Group大。

如果是我的話,就會把16個Members的Group分成4組4個Members的Group,主要還是不放心。

究竟怎樣分配才是最佳的平衡點呢﹖

Biggest Disappointment About vSphere 5.5 New Feature AppHA (Application High Availability) by Veeam

By admin, September 23, 2013 10:39 am

Just read the following in the latest Veeam Community Forums Digest and it’s quite interesting.

In fact, I use a much simpler method in Windows environment, I simply set the particular services to restart by itself should there be any failure, it worked perfectly so far, no hassle at all. :)

You may remember after sorting through all of the vSphere 5.5 features a few weeks ago; I was most excited for the vSphere AppHA (Application High Availability). Well, I have to admit it turned into my biggest disappointment based on some hands-on experience.

The theory behind this feature sounded excellent: in addition to vSphere HA (high availability) that VMware provided for a few years now (VM monitoring, with automatic VM restart after VM or host failure), the same will now be possible at the application level (application monitoring, with automatic restart of services and/or VM in case of application failure). And because this will be built right into the platform, it’s going to be transparent and easy to use… or so I thought, based on years watching VMware dishing out incredible functionality that was always integrated, intuitive and “just worked”.

I assumed VMware will simply “enlighten” VMware Tools with the ability to detect known applications and monitor key metrics, and also make this framework extensible for custom applications (similar to pre-freeze / post-thaw scripts for application-specific snapshot logic). In case of application failure detected, VMware Tools would throw events into vCenter and first attempt “local” recovery by restarting services, and if that does not help, message vCenter to restart the VM. This architecture would make AppHA work out of box for every VM (including newly added), with zero hassle for admins: huge value that EVERY user would immediately benefit from.

Well, it appears that I assumed too much. In reality, the feature comes with incredible complexity, and is based on legacy architecture I would not expect leading virtualization vendor to release in 2013. First, this feature is not something built into the platform, but rather completely “glued” on top of it. Before you can even start using this feature, you will need to deploy two separate appliances… yes, one was not enough! The first appliance is Hyperic appliance (recent VMware acquisition), which is Microsoft SCOM like tool with ugly web interface (carrying maybe 10% of SCOM functionality), and sporting identical architecture (thus bringing 100% of SCOM complexity along). Second appliance is actual VMware AppHA appliance, which seems to orchestrate “stuff” between Hyperic server and vCenter Server.

And the “best” part? AppHA requires that you deploy special monitoring agents in every VM, so welcome back to the agent management fun we’ve made great strides to avoid (having to remember to install, upgrade, and babysit yet another agent in your VMs). And even worse, you will also need to ensure that every VM is accessible to Hyperic server over the network! Direct network connectivity to a VM from core infrastructure servers? What’s up with that, I thought cloud was all about complete isolation? In other words, just think about all the things you like about agent-free Veeam solutions, remember how you struggled with agent-based solutions before, and apply all that to vSphere AppHA. I totally expected they would simply reuse VMware Tools, because it is the necessary evil we have to live with… but unfortunately, this is not the case.

This is probably the first time ever that VMware delivers the feature that sounds good on paper, but has horrible implementation in reality. It feels very much like a “buy and glue on top” approach, rather than “innovate and build” acquisition. Are we seeing the change of VMware approach to R&D? I honestly hope this was more of an exception, rather than a rule, but this is still worrying and very annoying for me, hardened VMware fan. I will definitely be looking for VMware folks behind AppHA at VMworld Europe next month to discuss this, and understand what’s going on with this feature.

Equallogic Alert: Raid Battery Failed

By admin, July 14, 2013 10:17 pm

Well, it’s about time, battery normally last for 3 years, this is the first time I encountered such error and the 2nd time EQL went wrong, the first time was due to a failed disk.

event: 28.4.31
time: Sun Jul 14 21:43:29 2013
NVRAM battery failed. Power failure could result in loss of data.

Critical health conditions exist.
Correct immediately before they affect array operation.
NVRAM battery failed and must be replaced.

nvram

There are 1 outstanding health conditions. Correct these conditions before they affect array operation.

Active control module cache is now in write-through mode. Array performance is degraded.

Note the Write Latency shoots up right away because Write Back Mode is disabled although you can force to use Write Back mode.

latency
Called Dell Pro-Support, will replace by tomorrow, cause I don’t feel doing it tonight.

Update: 7/15/2013

Dell’s engineer came to the site this afternoon and fixed the problem within 5 minutes, simply swapped out the active controller card (where NVRAM battery failed), and almost right away the standby controller kicked in, I noticed only 2 ping were lost in both grpadmin and vm IP address during the controller switch over. Finally the engineer said I’m probably the first one in Hong Kong who had a battery failed in EQL, most of the problems are related to disk, the power supply, then controller card.

I can’t think of any reason anyone don’t like this kind of high redundancy with easy maintenance built in mind! Bravo! Equallogic!

IMG_9257

Firefox Issue: Equallogic Group Manager Applet Doesn’t Work Properly After Upgrade to Java 7 Update 25

By admin, June 28, 2013 3:25 pm

The EQL GM Java Applet stays as blank screen for about 2 minutes and then throws an exception as shown in the picture below, simply ignore the error then group manager login screen will appear agin. This problem doesn’t occur if I launch the EQL GM in web browser, strange!

Seemed quite a few having the same problem on Dell’s EQL forum after upgrading to Java 7 Update 25, Yes, we selected to upgrade because there is a serious security hole in Java 6.

eqlgm

This is the warning showing in Firefox Add-On Page.

javaplugin

The other problem is after I clean the Java Cache, my EQL Group Manager icon on desktop has also gone, does anyone know how to recreate such icon? I don’t want to re-install HIT for Windows again just for getting the icon back for sure. :)

Is 7,200 RPM NL-SAS Really Reliable? Think Again!

By admin, June 27, 2013 8:11 am

Just received an alert from Equallogic this morning regarding hard disk firmware update.

Dell has made improvements in the drive error handling routines of EqualLogic array firmware over the course of the last few years and has worked closely with its drive manufacturers to improve the error handling routines of the hard drives.

We have released the newest version of hard disk drive firmware, EC04, for the below listed 7200RPM based 1TB, and 2TB drives shipped on the PS4100E, PS6100E, PS6110E, and the PS6110E arrays

  • Toshiba 7200 RPM NL-SAS MK1001TRKB (1 TB)
  • Toshiba 7200 RPM NL-SAS MK2001TRKB (2 TB)

If you are using arrays with these drives, Dell strongly recommends that you update the hard disk drive firmware.

I recall I’ve received the same kind of alert at least 3-4 times regarding 7,200 RPM SATA/NL-SAS firmware update in the past 3 years and none for SAS. Worst many users reported frequent 7,200 RPM disks failure or false positive. In additional, past EQL’s firmware updates constantly indicate there were problems with error detection or false positive of the 7,200 RPM disks. So I think this does provide you a clear picture how reliable those slower disks can be. Now with disks moving into 4TB each, I don’t think it’s pleasant scenario to see one of these failed.

The good thing is Equallogic is always working closely with the disk vendor to improve its reliability over the years. That’s why we see Improved “hard drive monitoring intelligence with an advanced predictive reliability algorithm” has been built into it’s latest firmware again.

We have released recommended software updates for EqualLogic PS Series Arrays: Firmware versions 6.0.5 and 5.2.9, which include key maintenance fixes.
Notably, the v6.0.5 release includes recent improvements to hard drive monitoring intelligence with an advanced predictive reliability algorithm. This algorithm is designed to help preserve overall system reliability and long-term performance by proactively identifying drives which are at risk for failure, copying their data, and allowing you to safely replace them. In a small percentage of storage arrays, this process will occur shortly after the array firmware is updated. More details are included in the release notes. Version 6.0.5 also removes a false error warning that appeared on some arrays following drive replacement, and includes other fixes.

We recommend that you move to the v6 firmware stream and adopt v6.0.5. However, for customers staying on the 5.x code stream, we have released v5.2.9 which includes the drive reliability algorithm mentioned above and additional fixes.

Interesting Post: Why Equallogic Doesn’t Support Active-Active Controllers

By admin, May 15, 2013 7:29 pm

Saw this interesting post today, almost dated a year ago.

Q: Why does EqualLogic not support having active/active controllers?
A: This is a very good question. EqualLogic runs the active and passive controllers connected by a thick I/O pipe that effectively maintains the passive controller as a mirror of the active controller, this allows for near instantaneous failover in the event of a RAID controller failure – there is no need for the controller having to seize ownership of the failed controllers disks. This is supported by write cache mirroring and the write cache is cached to flash memory.

Note: The process of controller failover uses MAC spoofing and needs portfast and rapid spanning tree enabled on switches.
Q: Does EqualLogic support a Thin Provisioned LUN Space Reclaimer?
A: Not yet, this is in the pipeline.

Also came across this reply in Dell’s Forum.

Some terminology might be helpful here. Equallogic embeds their controller/filer/and disk shelves into one unit. The controllers are active/passive meaning only one controller is ever usable. The filer itself is tied directly to the disks. Other vendors handle this in different ways. Dell has chosen with the Equallogic system to do this.

Some vendors implement “raid” across the filers themselves (HP LeftHand’s network raid). Other vendors offer active/active controllers, or NetApp metrocluster functionality. Dell Equallogic does not.

We operate three Equallogic arrays in production use, and have never suffered a controller failure. When we preform firmware updates the unit reboots twice, taking it offline for 15 seconds. We do this during ‘quiet’ activity hours on our VMware, SQL Server, and Exchange clusters. They seem to handle the 15 second downtime without issue.

We have not seen the Active/Passive controller layout of the Dell Equallogic as a negative. The failure of an entire Equallogic filer (both controllers and both power supplies) is extremely extremely rare. There are no shared components between the controllers, they are functionally separate filers. The unit is right-sized for our organization and provides enterprise functionality at a fraction of the cost of a similar product that wold allow Active/Active enterprise level controllers, or Metrocluster functionality.

In summary:
> Dell Equallogic does not allow Active/Active Controllers, or full ‘raid’ between discrete units.
> Dell does not offer ‘Metrocluster’ or ‘Network Raid’ functionality like DRDB.
> Reboots of the entire SAN take 15 seconds (yes, really, as a customer, not Dell marketing) and do not cause any issue for us.

OpenStack, An Alternative to VMware and Hyper-V?

By admin, May 10, 2013 9:39 am

In July 2010, Rackspace Hosting and NASA jointly launched a new open source cloud initiative known as OpenStack. The mission of the OpenStack project was to enable any organization to create and offer cloud computing services running on standard hardware. The community’s first official release, code-named Austin, was made available just four months later with plans to release regular updates of the software every few months. The early code comes from NASA’s Nebula platform as well as Rackspace’s Cloud Files platform. Early on in the history of the project, Ubuntu [9] Linux distribution decided to adopt OpenStack.

Just read about this new jargon, very interesting indeed as OpenStack was initiated by a leading web hosting company in US and there is an extremely self-conceited man in China who said that he will replace VMware with OpenStack within a year…Gosh!

Beyond Five Nines 99.999% Availability with Dell Compellent

By admin, May 5, 2013 11:15 am

Five nines or 99.999% availability standard (5.26 mins downtime per year) has its origins in the telecom industry. It characterizes the technical capabilities of an individual system. It does not characterize the capability of an organization to use the technology to meet its goals. To measure the impact of technology on an organization requires consideration of the entire IT environment and its effectiveness as a whole in providing access to data.

1

Dell Compellent: Going beyond five nines

By virtualizing physical resources, Dell Compellent Storage Center achieves a higher level of abstraction that overcomes the limitations of traditional storage, allowing you to perform routine management and maintenance without taking down the applications users rely on to keep business moving.

At Dell, we’re addressing data accessibility issues with a 24×7x365 mentality. The Dell Compellent family offers 99.999% availability1 by the standard measure, but we go beyond the concept of five nines. We take planned data downtime into consideration in our approach to building our hardware, the technologies behind our software and in our unique, award-winning support.

Building high availability into hardware

The foundation of continuous data availability is based on a hardware environment in which users can access the data during activities that traditionally require downtime, both unplanned and planned.
Dell removes the potential for a single point of failure from the whole environment rather than merely moving that single point of failure around within the environment. Our approach is to provide a hardware environment in which accessing data uses no shared components.

Clustering dual storage controllers with no shared backplanes or midplanes, and then connecting that fully redundant cluster to storage devices in a multi-loop or multi-chain configuration, provides redundancy at all points and allows for a hardware environment that is highly available. Providing reliability and redundancy in components most likely to fail—power supplies, fans, spinning disk drives—contributes to this infrastructure of availability. Wherever possible, components are designed to be hot swappable, eliminating downtime for maintenance and repair.

Data management traditionally requires planned downtime. Virtualization can help change that. Dell Compellent virtualizes storage at the drive level, enabling you to create high performance, highly efficient virtual volumes in seconds, without allocating drives to specific servers and without complicated capacity planning and performance tuning. Read/write operations are spread across all drives in your virtualized pool of storage, so multiple requests are processed in parallel, accelerating data access.

In addition, you can change and scale your virtualized storage dynamically without disruption or downtime. Start with a single controller, add a second controller, join the two into a cluster, all while allowing access to the data. Add drives and drive enclosures, replace fans and power supplies—even go inside a controller and replace interface cards to upgrade or fix hardware issues, with your data accessible to users all the while.

A hardware environment built on this blueprint can keep data accessible during activities that traditionally had a negative impact on business, and keep the organization moving forward.

Building high availability into software Software

that contributes to data availability includes automated data placement, virtualization, and data protection solutions. Again, the Dell Compellent approach is holistic, and focused on keeping data optimally accessible for users—in any circumstance, and at every stage in its lifecycle.

Dell Compellent storage software features built-in automation that optimizes the provisioning, placement and protection of data throughout its lifecycle. For example, storage tiering enables an organization to keep data available cost effectively. Data Progression, Dell Compellent’s patented tiering technology, automatically classifies and migrates data to the optimum storage tier and RAID level based on actual usage. As shown below, all new data is written to Tier 1, RAID 10 and snapshots cascade to the lowest available tier within 24 hours. Then, the most active blocks of data remain on high-performance drives, while less active blocks automatically move to lower-cost, high-capacity drives. Under this approach, your storage is optimally utilized, data is easily recoverable and users and applications have fast access to the data they need.

2

Dell Compellent automated tiered storage dynamically classifies and migrates data to the optimum tier based on frequency of access.

Another way Dell Compellent builds high availability into storage software is Dell Compellent Data Instant Replay, sometimes referred to in the industry as “snapshot technology” or “continuous data protection.” A Replay is similar to a snapshot in that it captures a point-in-time copy of data; however, it has intelligence that lets you access read-only data without having to make a copy of that data. You can take continuous, space-efficient snapshots to speed local recovery of lost or deleted files. Once an initial snapshot of a volume is taken, only incremental changes in the data need to be captured. Every Replay is a readable and writable volume that is automatically stored on lower-cost drives, and can be used to recover any size volume to any server in less than 10 seconds.

Remote Instant Replay leverages Replays between local and remote sites for cost-effective disaster recovery and business continuity solutions. After initial site synchronization, only incremental changes in data are replicated on an ongoing basis, cutting hardware, bandwidth and administration costs. You can replicate over Fibre Channel or native IP as your business requires.

Dell Compellent Live Volume, enables dynamic business continuity by letting you move storage volumes between Dell Compellent arrays on demand. All migration occurs transparently while applications remain online. Live Volume functionality is fully integrated in the Dell Compellent platform and requires no additional hardware, server agents or costly appliances. It supports any virtualized server environment and complements leading virtual machine movement engines.

3

Array RAID configurations and associated RAID sets

By admin, April 29, 2013 12:14 pm

The tables show a logical drive layout when an array is initialized for the first time. The actual physical layout of drives can change and evolve due to maintenance and administrative actions. Spare drives can move as they are utilized to replace failed drives and newly added drives become the spares. It is not possible to determine which physical drives are associated with each RAID set. This information is dynamic and maintained by the EqualLogic firmware.

1

PS Series Firmware Compatibility with EqualLogic Tools

By admin, April 29, 2013 12:03 pm

The following table provides a quick reference of EqualLogic product version compatibility for the recent major firmware releases.

1

Pages: Prev 1 2 3 4 5 6 7 8 9 10 ...16 17 18 Next