StackAccel

≡ Menu

vSphere: VM Stuck during Power down at 95%

Occasionally I’ve run into a VM that gets stuck at 95% while powering down (or during a vMotion).  I know the issue isn’t unheard of, but I didn’t run into it until working with a few ESXi 4.0.0 208167 servers. So – if you have a virtual machine hangs while shutting down – and you’re certain that you’re just not waiting for it to finish powering down, and you’ve already tried to “power off” from the client- but the power-off command is stuck at 95%, you may have to manually kill the hung VM.

stuckat952

  1. Login to the host with the hung machine via SSH (enable SSH if you haven’t already)
  2. do a /sbin/services.sh restart (or services vmware-mgmt restart on ESX)… which is the same thing as doing this from the ESXi console
  3. This command will restart the agents that are installed in /etc/init.d/ … including hostd, ntpd, sfcbd, sfcbd-watchdog, slpd and wsmand (and HA if you have it)
  4. When you do this, the VI/vsphere client will loose connectivity as those services restart, but VM’s that are running will not be affected
  5. After the services have restarted, you can re-connect via the VI client.
  6. Via SSH, go to the right datastore (such as, /vmfs/volumes/DatastoreName/VMname), and delete (rm -r) the *.vswp file (the swap file).
  7. If you can’t delete it, and you’re getting an error message to the effect… can not remove VM: device or resource busy… go find the processes associated with the VM.
  8. “ps auxfww|grep “vmname”
  9. “kill -9 ProcessIDNumber”
  10. After doing so, remove the orphaned VM from inventory… just right-click the “unknown” VM, and select “remove from inventory”, being careful to not delete it.
  11. Then delete the *.log, and *.0*. If you don’t, re-adding the VM may cause the interface to hang, and you’ll have to go through some of this all over again.
  12. Add the VM back to the inventory, and you should be able to start the VM.

I have run into a situation once, where a host reboot was the only way to solve the problem. But other than that, this seems to be quite effective.  The short version – see steps 8 & 9.

WordPress Toolkit Series – Part 1, Choosing a CMS

Back in the 90’s, before content management systems, Intranet sites, and WYSIWYG editors, web-sites were built by hand with HTML code and ASCII text editors.  Folks armed with a bit of knowledge and some patience, laboriously constructed web sites.  Even if you happened to use tools like the HotDog HTML editor, bringing web-sites to market was a time consuming and expensive endeavor.  Fortunately, that’s no longer the case.  We all know the story… web technologies, driven in no small part by the explosive growth of tablets, and smartphones, have advanced and we have a modern ecosystem that was barley hinted at in the 90s.  Underpinning much of the modern web, are content management systems (CMS) like WordPress, which enable and accelerate bringing web-sites and projects to market.

My goal with this series to essentially provide small & medium sized business with a WordPress Toolkit, arming you with everything you need to know to either bring a site to market yourself, or give you enough knowledge to make good decisions when it comes to hiring a company to partner with.

What is a Content Management System

hub-and-spoke-2There are probably some use cases where hand-coding web sites still makes sense.  For the rest of us, content management systems (CMS) exist to save us time, and money.  A CMS is a piece of software that manages web-site content.  Sounds simple, right?  By that definition there are hundreds of CMS platforms today, incorporating solutions as disparate as blogger.com and WordPress.  You could always choose a CMS platform and hosting option in one, putting your site on somewhere like blogger.com, or the like… but why would you opt for a free site (e.g. yousite.blogger.com), where your content is really only serving to increase the value, brand, and awareness of someone else’s property?  While you might choose to augment your business’s content via micro blogging and social media sites like Tumblr, Twitter, Facebook, Instagram, etc., your presence on those properties should serve to increase the brand and awareness of your business, and bring people to your web site.  Think in terms of your web-site as being the hub of your presence, and spokes extending out to other properties that serve to increase your exposure.

To filter down the number of CMS platforms a bit, when I talk about a CMS platform, what I’m really referring to is the mainstream open source options that are common on many hosting platforms.  The top three by marketshare are Joomla, Drupal, and WordPress.  Of those, WordPress accounts for roughly 75% of the CMS market, powering the infrastructure of more than 100 million web sites, including big sites that you’ve probably heard of like TechCrunch, CNN, TED, and others.

WP-pie

But what about… Joomla, and Drupal?

Both Joomla and Drupal are fine CMS platforms.  I’ve taken a look and kicked the tires a bit.  But here’s my biggest problem… I’ve only had one client ever even mention anything other than WordPress them by name.  For the clients that I work with, many have some sort of familiarity with WordPress, even if it’s just that they’ve heard of it before.  And then we talk about the fact that CNN, UPS, TechChurch, and so many other high-profile sites run WordPress… for many, using WordPress is a forgone conclusion.  Which isn’t to say that WordPress is without its faults, but it is an enabling platform that helps you bring your project online quickly without necessarily needing to dig into the code.

What are some of WordPress’s faults?

There’s plenty of naysayers out there for whatever your selection of a CMS platform turns out to be.  With that in mind, in the case of WordPress, the fact that it owns a significant portion of the market has to some extent, made it a victim of its own success.  Not unlike Microsoft’s Windows platform, or Google’s Android platform.  If you think about the economics involved it quickly becomes obvious, the biggest players in a market tend to be the one’s targeted for malware and virus exploits simply because there are large number of targets to exploit.  If I was going to identify the single biggest “problem” with WordPress – it’s that its marketshare makes it a bigger target.  That same market share is also why it gets the most developer attention and why it has the most robust plug-in ecosystem.  This isn’t to say that you’re better off with Joomla or Drupal – particularly if you don’t know much about those platforms, as each has its own vulnerabilities.  But what I am saying is that there’s a tradeoff.

Performance

Another area that you might find folks arguing against WordPress is performance.  Probably the biggest thing to keep in mind about performance is that…

  1. For the vast majority of even mid-size organizations, WordPress performance isn’t something you really need to give much thought to.
  2. If WordPress performance is a problem for you that
    probably means that you have enough visitors and traffic (and revenue) to deal with the problem.
  3. If you’re a small site with a performance problem, your theme is probably the most likely reason.

In short though, for most folks, avoiding WordPress because of performance concerns just doesn’t hold water.  For organizations that do have real performance concerns, there are several ways to address that which we’ll touch-on elsewhere in this series.

My recommendation, of course is to go the WordPress route and make good decisions as you go.  And I’ll be here to help you make good decisions.

Utilization Forecast: Solved with Iceberg

Forecasting utilization is the bane of every business where high-skill employees bill their time in hours.  For far too many companies, juggling individuals, teams, and clients isn’t just an occasional necessity… it’s the status quo.  And as you scale the business, what might have seemed easy with a small team, only gets harder the closer you get to your Dunbar number.  It seems like almost overnight, you’ve gone from making strategic decisions, to survival mode.  Instead of focusing on resource utilization, improving margins, and holding individuals and teams accountable, these suddenly seem like luxuries you can’t afford – even when you know that doing so is critical to your success.  To make matters worse, Project Managers who might be able to help drive this, are so busy with Project schedules, or trying to come up with an S-curve that they’re unable to contribute in a meaningful way to solving this problem.

But what if there was a better way to do forecasting in a small business?  What if you could drive accountability across a growing organization, eliminate choke-points, enable you to scale the business, and at the same time reduce the number of unknowns?

That’s why I developed Iceberg.

Iceberg turns traditional top-down forecasting on its head.  Instead of relying of Project Managers, or individuals in “leadership” positions to forcast top-down, Iceberg makes it painless for everyone to forecast their own utilization, letting you understand what’s going on beneath the surface of your business.  Three months after implementing Iceberg…

  • Forecast Accuracy improved by 25%
  • Service revenue increased by 15% 
  • The number of bad surprises were cut in half!
  • Leadership began to focus on strategic decisions instead of fighting fires 

Iceberg is available for purchase directly (contact me), and I’m also considering making it available as an SaaS application if there’s sufficient interest – please fill out the survey form.

Watch the beta intro and demo.

Phone System Guide Part 2: Allworx vs. Everybody else

If you haven’t read Part 1 of the guide yet, you might want to go back and read it first.

Even if you’re unfamiliar with the work of Frank Loyd Wright, his place in American architecture, or the sheer volume of what he produced, you probably recognize Fallingwater.  It was commissioned as a weekend retreat for the Kaufmann family of Pittsburgh, and was completed in 1937 at a cost of about $155,000 (at least $2.6 million in today’s dollars).    It was built for a time, and for clients that valued quality or quantity, and appreciated Wright’s organic style.  It’s a stark contrast to the modern tract-style McMansion.  Let’s contrast Fallingwater with your investment in a phone system.

fallingwater-2

Phone Systems aren’t beautiful.

They’re not enduring.

They’re not even that unique.

Of the 150 phone systems that I’ve looked in the past year, they’re more than 90% identical. In fact, the only thing more ubiquitous than their feature-sets is the near universal contempt that business owners have for buying phone systems.  It’s not hard to see why… phone systems have a reputation of being big, expensive, and frustrating.  It’s an epitaph earned in earlier product generations.

As I alluded to in Part I, and elaborated on in my free PDF download, before buying a phone system – you should know what you care about.  Not everything mind you… you don’t necessarily need every detail thought-out beforehand (that’s part of why you’re soliciting bids, after all).  But at 30,000 feet… if there’s one or two must-have feature requirements for you, be sure you know what those are.  Otherwise, if you bring in the integrators who compete in this space – unless you have a very specific need, or have a situation where there’s a clear incumbent, you’re going to find yourself comparing apples and oranges in a market saturated with feature overlap.

Given no constraints, I’m a big fan of ShoreTel’s Unified Communications Platform.  As much as I respect Asterisk-based systems, and the contributions that Digium has made to the ecosystem, I can’t help but like ShoreTel’s platform.  To the extent that a phone system can be beautifully designed… this one is.  The platform runs on a combination of embedded Linux, and VxWorks (it’s the same OS that runs the Mars Curiosity Rover).  Interested in five-nine’s?  It’s achievable.  Geographic failover, high-volume call centers, fancy mobility features (that few people actually use) – all the boxes are checked.   In other words, ShoreTel’s platform is the phone-system equivalent to a home designed by Frank Loyd Wright.  Or if that analogy doesn’t strike you, than it’s probably the BMW of phone systems.

But here’s the thing… most of us don’t want, or need to live in a home built by Frank Loyd Wright, just most of us don’t need something like a BMW 7-series.   More than that though, even with some innovative modern takes on the Prairie style home, and renewed general interest in organic architecture – your average executive is going to pick a turn-key tract-style McMansion in the suburbs near the office, over a Frank Loyd Wright original.  Why?  Because the tract style McMansion is good enough.

mcmansion

Enter the tract-style McMansion of phone systems.

The Allworx Unified Communications platform was conceived of in 1998 by two engineering executives from Kodak and Xerox who formed what later became known as Allworx. During the first few years of business, while the Allworx system was being developed, the company established a presence doing consulting work for big-name organizations like Kodak, HP, Xerox, and Harris. The Allworx phone system came to market at just the right time in 2002. With the platform positioned as a turn-key VoIP telephony solution, it competed feature-for-feature with many higher-cost alternatives, which were often sold via regional telephony providers. As Allworx grew, it was acquired by a publicly traded Fortune 1000 company, before eventually becoming an asset of Windstream Communications in 2011. If this were a different company, then the story would probably have ended in relative obscurity as nameless corporate asset.  But that doesn’t appear to be the fate for Allworx with Windstream… since 2011, Windstream, leveraging their role a telephony provider in some markets, and taken clear steps to increase market penetration for the Allworx’s product line.  As a result, the Allworx platform has experienced revenue growth of more than 250% over the past year. As of last check, the Allworx division had 80 employees, and their product continues to win market share from higher-cost alternatives sold by companies Cisco, Avaya, ShoreTel, and others.

Allworx is continuing to win business away from the high-cost Enterprise-focused competitors, by providing an innovative, low-cost, and easy to manage alternative with a rapid and continuous release cycle approach to upgrades and platform improvements.

Perhaps calling Allworx the “tract-style McMansion” of phone systems is the wrong tract, as it’s clearly a low-cost platform.  So if ShoreTel is BMW, then I’d venture to call Allworx the Toyota of phone systems.  Allworx is the mainstream, mid-sized economy of boring purposeful solutions.  It’s not a kit-car, like the FreeBPBX, nor is it a BMW 7-series.  But you’ll be hard-pressed to go as far at a comparable level of cost as the Allworx platform will take you.  It’s the purpose-built and priced-right unified communications solution for small and mid-sized organizations than want reliability and ease-of-use, without paying a premium for high-end hardware, or obscure feature-sets.  Put differently, a client recently contracted me to help them with phone system vendor selection, they were a medium-sized, multi-site organization, and I brought in bids from integrators representing products from Allworx, Cisco, NEC, Digium, as well as a Cloud-based solution.

Allworx had the lowest installed cost.

NEC and Digium solutions ranged 20%-25% higher.

ShoreTel?  2.5X.

Cisco… more than 3X.

Hosted solution?  It cost more per year than the Allworx solution.

By that I mean, not only was the installed cost of the hosted solution higher than Allworx, but you could afford re-purchase the entire Allworx solution every single year, for what the hosted solution cost.   Now granted, the hosted solution could tolerate a data center failure, since it’s serviced by separate regional datacenters.  That was its primary, albeit expensive edge.

How reliable is the Allworx solution?

Availability can be measured in terms of nines of reliability. For example, the platinum standard is five-nines… 99.999%.  That’s a cumulative total outage of 5.62 minutes per year.  To put that in perspective, last year Google failed to achieve five-nine’s of availability in search.  Surprised?  Amazon and Netflix fared far worse .  In fact, even the public switched phone network fails to reliability achieve five-nines in a given year.  There are a lot of reasons why five-nine’s is hard to achieve, but to give you some perspective… the difference between four-nines and five-nines is 47 minutes per year.  The cost of targeting five-nines tends to grow exponentially as you approach it.  On the other side of the fence, phone system manufacturers love to tout five-nines.  And it’s not just a bullet point on their marketing material, because targeting five-nine’s adds complexity, hardware, licensing, and cost.  So while you may desire 99.999% availability, something to keep in mind is that even if your system can conceivably this level of availability, make sure you have an obvious business case for it.  As in, unless your business lives and dies by the phone (e.g. a busy call center), you might be overbuying.  In my experience, my clients that have Allworx systems are generally seeing around 99.9% availability in a given year.

If you require anything resembling five-nines, you shouldn’t buy the Allworx 48X.  It’s just not designed with high-availability in mind.  Allworx makes no pretense about it… there’s simply no attempt at competing in high-availability space.  Which means Allworx can spend their time and resources improving things that the target market values … bug-fixes, usability, and day-to-day feature requests.  Perhaps that’s why you can afford to buy a cold-spare Allworx 48X, along with everything else and still have a more competitive offering than the competition does, even before the competition adds high availability capabilities.

Wright’s masterpiece, Fallingwater was completed in a timespan of three years, and overlaps two other of his great works from the late 1930s.  The original sketch for Fallingwater was actually drawn up while Kaufmann was on his way to meet with Wright and review the “completed” sketches of Fallingwater.  In defense of  Wright’s genius (or arrogance), he said that Fallingwater was fully formed in his mind and that he was completely prepared for the meeting before drawing a single line.  Perhaps.  In comparison, the Allworx platform is the byproduct of more than fifteen years engineering, funded by investors focused on a building an innovative, low-cost, high-value unified communications platform that is essentially turn-key.  At the end of the day, Allworx provides you with a mainstream unified communications solution, at a price point that the competition can’t touch.

The above article is a continuation of Part I.  Some of the other platforms that were evaluated included Avaya’s IP Office, ShoreTel, Digium’s Switchvox, Trixbox, FreePBX, and Allworx.  If you’re interested more of the detail surrounding the vendor selection process, just sign-up for the newsletter here, and you’ll be able to download this article along with my deliverable (which includes some additional commentary).  I use the newsletter to send the occasional update, often with content that’s exclusive to the newsletter (e.g. the client-facing deliverable).  No spam ever.   Unsubscribe at any time.

Building a 2-Node ESXi Cluster with Centralized Storage for $2,500

2nodeEsxiClusterWhat started as a simple goal… replacing my vSphere 4.1 whitebox with something that more closely resembles a production environment, became a design requirement for a multi-node ESXi lab cluster that can do HA, vMotion, DRS, and most of the other good stuff.  But do it without having to resort to using nested ESXi.  And I wanted an iSCSI storage array that was fast.  Not, Synology NAS “kind of” fast under certain conditions… but something with SSD-like performance, and a bunch of space.  And I didn’t want to spend more than $2,500For everything.  In other words, what really I wanted was something akin to a 2-node ESXi cluster with SAN backing it with 10TB of “fast” disk I/O.  Almost something like a Dell VRTX, but for my home lab.  And I wanted to spend less one-tenth of what it might cost with Enterprise-grade gear.   As I iterated through hardware configurations, checking the vSphere whitebox forums, contrasting against the HCL, and running out of budget quickly, it became clear that the only way to do this was to figure out storage piece first.

Key Design Decision

So the question became, do I really want to bother with having a “real” iSCSI storage array.  If not, then  I could opt for multiple SSD drives locally with a hardware RAID, and just reconsider the need for a dedicated storage array entirely –  or making some other compromises.  Sure, I would have had more IOPS than I would have known what to do with, and yeah, maybe I could have resigned myself to living in a nested ESXi world, but no.  As it turned out, someone had already done some good heavy lifting on this topic, and had a fast homemade iSCSI storage array with hardware RAID for $1,500.  No VAAI of course, but still… not bad for $1,500.  More than that through, that left $1,000 to put together a couple of hosts – so plenty, right?

Critical Path item – Inexpensive Storage

If the storage array as a build component was the biggest overall challenge, then raw storage was the critical path item in terms of procurement.  In order to come out with about 11TB of usable capacity, I needed 14 x 1TB drives in a RAID6, and I needed to spend less than $700.    Depending on when you read this article, that may be less of a challenge than it was in early 2014, but at the time $50 per TB on a 7200RPM drive was hard to come by.  Still harder was finding Hitachi Ultrastar 7200pm drives with 32MB of cache – as they’re Enterprise-grade and not always around on Ebay.  After a couple of months bidding on large lots, I eventually found 14 for $50 each.  Given no schedule constraints, I could have perhaps ended up with 2X as much storage, for a reasonable cost premium – but 11TB of usable space in a RAID6 configuration exceeded my need, and kept me on pace to acquire most of the hardware on schedule.

Controller

lsi80416e

The LSI MegaRAID 84016E is, albeit last generation, a SAS/SATAII workhorse of a controller card.  Supporting up to 16 drives, and RAID levels 0, 1, 5, 6, 10, 50, and 60 at 3Gb/s per port, with a battery backup module, and online capacity expansion and it’s dirt cheap on Ebay and nearly always available.  For my use-case, it compared favorably against going the local SSD drive route, or using LSI9260-4i which cost 4X.  If you’re copying the build, be sure and pick-up four of these Mini SAS (SFF-8087) Male to SATA 7-pin Female cables so that you can plug you SATA drives into the LSI 84016E.

Storage Array: Everything Else

This section could almost everything else, because by now we’ve spent about 31% of the budget and many of the remaining decisions are mostly inconsequential.  Still I opted for a Rosewill RSV-L4500, because I knew it would fit all 14 drives in it (and it does, and the design isn’t bad at all – easy enough for me to work in).  I did add a spare SSD drive as an OS and Level -2 Cache for PrimoCache (though the SSD was scavenged from another box).  For CPU, Memory, and Power Supply – I went with the ASRock 970 Extreme 3 – $65, AMD FX8320 8-core CPU (because it was on-sale for about $105), 32GB of the lowest-cost DDR3-1600 ECC RAM I could find, and a 750W Corsaid HX750 power-supply to power  all of these drives. For the network interfaces, I had a spare Intel 2-port NIC lying around that I had picked-up in a lot sale a couple of years ago.

Making the Storage Array Useful

5742_01_rosewill_rsv_l4411_rackmount_server_case_review

There are a number of ways you can go with this.  Nexenta, Microsoft Storage Spaces, OpenIndiana, or FreeNAS to name a few.  But I really wanted to take a look at Starwind’s SAN/iSCSI combined with PrimoCache.  Short version… you install Windows 2008R2/2012R2, configure the LSI controller software (MegaRAID), add Starwind’s SAN/iSCSI software and carve out some storage to expose to ESXi, and PrimoCache and configure it to use as much RAM as you can for the Level-1 Cache (26GB), and as much SSD space as you can for the Level-2 Cache (120GB).  In my usage scenario, this seems to work pretty well – keeping the disks from bogging down I/O.

vSphere ESXi Cluster

With the remaining budget, I still needed to build two vSphere ESXi boxes.  As I started looking, I found it really challenging to come up with a lower-cost and still good build than using the ASRock 970 Extreme 3, AMD-FX-8320 3.6GHz 8-Core processors, 32-GB of RAM, ATI Rage XL 8mb, Logisys PS550E12BK, spare Intel Pro/1000, 16GB USB sticks… I built two more machines – installing vSphere ESXi 5.1 on the USB sticks, and mounting the iSCSI volumes that I exposed in Starwind’s SAN/iSCSI and began building out my vCenter box and templates.

Compromises to hit budget

In order to stay around $2,500 there were a few small compromises that I had to make.  The first was the hard drives… I couldn’t wait for the 2TB Hitachi Ultrastar drives.  While not significant for my use case, it is nonetheless noteworthy. Secondly, I ran out of budget to put a full 32GB of RAM in the both of the hosts, so I have a total of 48GB across the two nodes, instead of the 64GB that I hoped for.  Finally, for the hosts, I bought the lowest cost cases I could find – $25 each.  Aside from scavenging a few parts that I had laying around (an SSD drive, a few Intel Pro/1000), I managed to come in on budget.

Bottom Line

The project met all of my goals – a home lab, multi-node ESXi cluster with a dedicated iSCSI storage array that resembles a production environment – all on a budget of around $2,500.  I’m able to vMotion my VMs around, DRS is functioning, and Veeam Backup and Replication is working.  Better still, I can tear-down and rebuilt the environment pretty quickly now.  I didn’t really run into any show-stoppers per say, or real problems with the build.  If there’s interest, I’ll post some additional information about the lab in the future.  A big thanks to Don over at The Home Server blog for his work on Building a Homemade SAN on the Cheap, particularly in validating that you can actually buy descent drives in large lots on Ebay at a discount, as well as for the Motherboard recommendation which was critical in hitting the budget.

Visit Us On Twitter