Saturday 23 March 2013

You don’t have to be a Maverick to adopt OneFS7, but it helps

OneFS 7

Isilon’s latest operating system emerged from beta in Q4 2013.  During the beta Isilon chose the name Mavericks, which was a departure from many years of naming their betas after chillis.

As OneFS 7 has now reached the level of maturity where it will ship by default on new hardware, it seems a good time to see if OneFS 7 is a scorching Naga or a tepid Bell Pepper.

As someone who has used OneFS for many years, there are a number of key features in OneFS7 that jump out at me:

Roles-based Administration – Think of the current implementation very much as version 1, Isilon will build on this implementation and refine the granularity of permissions further with future releases, but even in its current incarnation, I am more than pleased to see it in the product.

Fast snapshot restoration, writable snapshots and file cloning.  All “enterprise” features that improve the ability for Isilon to complete in general purpose storage deployments against NetApp and others.

The excellent SyncIQ replication technology has been enhanced though the implementation of push-button failover / failback (this is something I need to spend more time playing with, but early signs are promising).

The IO capabilities of the filesystem have taken a step forward with much focus being placed on concurrent and sequential file access, but also improvements to IO latency – motivated in no small part by hopes of hosting VMware deployments on the platform.  The term “Endurant Cache” has been coined to cover Isilon’s new approach to caching and I am looking forward to delving into this in more detail in the near future. 

Unsurprisingly tighter integration with VMware is a much-highlighted feature of OneFS 7.  Take VAAI and VASA APIs, sprinkle in some Endurant Cache, Metadata Acceleration (via SSD’s) and per-file cloning and you have a platform that is better placed to handle VMware workloads.  But critically still no deduplication or read-acceleration to see off boot storms (read as PAM cards) to put their VMware solution on a par with NetApp.

It would also be unfair not to comment on the continued progress being made by Likewise, the company acquired by Isilon in early 2012.

Likewise have done a great job in making the SMB/SMB2 implementation on Isilon far better than it was a couple of major revisions back, there is still some way to go until the “unified” protocol access to the Isilon is a good as some other platforms, but things are certainly going in the right direction.

Wrapping It Up

 So where are we with OneFS7, that probably depends on your use case.

In a Windows-centric environment, OneFS7 is probably a better fit that 6.5.x.  SMB / SMB2 performance should be much better and more tuneable.

In a UNIX-heavy environment, I would still favour OneFS 6.5.x,  Endurant cache will help in many circumstances, but I’ve also heard that some operations may be slower in the current 7.0.x builds.

Undoubtedly OneFS7 is an improvement on OneFS6.5 and is testament to the significant investment that EMC continue to make into this true scale-out storage platform, but in my opinion it's still a little too new for prime-time.  There will certainly be deployment scenarios where you could deploy and run today (Windows-centric or VMware), outside of those, you’d need to be a Maverick to deploy into production so soon after launch. 

Based on previous Isilon release schedules, I would expect a substantial OneFS 7 point release by Q4 2013 that will bring with it additional features and stability.  In the mean time, I would recommend that Isilon customer grab a copy of the new OS and try it where they can.

I'll post in more detail about the new feature in future blog posts.

Thursday 7 March 2013

Sometimes you need to flush

One of the great things about the 200-series nodes (X and S) are that you can specify how much memory or SSD's you want to add into a node.  Fantastic! I can put 48GB of RAM and 2 SSD's (for metadata acceleration) in an X200 node to host my commodity data and 96GB RAM and 4 SSD's in an S200 node to support my high-performance storage requirements.

The issue here is that you could potentially be the first / only customer running a particular config.

So what happens when you send a shut down command to a node with 96GB RAM running OneFS 6.5.x well?  From some testing I ran at the start of this year it look 70 / 30 that the nodes will shut down as expected.  In the minority of cases the shut down is aborted, due to a timeout flushing data from memory.

To work around this issue you can run isi_flush before issuing the shutdown command.  Testing of the flush before shut down proved to increase success to 100%, so we have a fix until we have a fix.


As you might expect, you can run isi_flush through through isi_for_array to flush all nodes in a cluster prior to a shut down.

isi_for_array "isi_flush"

Interestingly, only the shut down command is impacted by the memory flush, reboots always work - go figure.