Saturday, July 30, 2011

Nexenta Core Platform 3.1

Nexenta is about to release version 3.1 of NCP (Nexenta Core Platform), existing installations can already be upgraded but there are no install images available yet. NCP 3.1 is used as the foundation for NexentaStor 3.1, Nexentas latest release of their software storage appliance.

The new 3.1 version is still based on a patched OpenSolaris 134 codebase, changes in this released include an updated ZFS pool version (28), write same/unmap support and probably many bug fixes.

The 4.0 release of NCP which is currently under development will be based on illumos that contains many additional features.

NexentaStor 3.1 available now

Monday, July 25, 2011

c0t0d0s0 closes down

The well know Solaris blow c0t0d0s0 run by Oracle employee Joerg Moellenkamp is now closed. Joerg gives no exact explanation why it is shut down, but there is a list of reasons that was not behind the decision.

It sad to see a good Solaris releated blog go away, we served very much the the same interest-group. I would not be surprised if Oracle has something to do with all this, it all feels a bit unclear and sudden. They have never been very found unstructured information being released out from the company. Before the acquisition Joerg used to write posts about upcoming Solaris features, much the same as I do when that kind of information is released in any public way.

Thanks for these year Joerg!

The LKSF book (Less Know Solaris Features) will be kept online here.

Tuesday, July 5, 2011

SPARC T4 information and Beta program

Oracle have announced an beta program för the upcoming SPARC T4 processor which Oracle perviously have disclosed should provide three to five times the single thread performance of current T3 systems.

"The aim here was to develop a processor core that would provide high-speed, single-thread performance while also addressing the needs of applications that benefit from the high efficiency and throughput of multithreaded cores. The SPARC T4 is up to five times faster than the SPARC T3 for single-threaded functions," says Rick Hetherington, vice president of hardware development at Oracle. "It's breakthrough technology for us."

This in combination with on-chip crypto acceleration, massive thread count and LDOM capability it looks like it will be the most interesting SPARC processor since the never released UltraSparc RK, Rock.

The new processor will also dedicate cores to critical software thread with a new Solaris API:
"The new systems use "critical thread API", or the ability of the Solaris operating system to recognize critical threads in applications and assign them, by themselves, to a single processor core. This allows the critical threads to run at the very highest performance levels without competing with other less critical threads. This delivers faster overall performance by accelerating the more critical components in threaded applications. "

Hopefully these advancements, at least in combination, will deliver broader on the T2/T3 promise of massive throughput without sacrificing or be limited by single thread performance. This would then make the T4 a very suitable general purpose processor with the advantages of handling massive amounts of load.

Be the First to Test Next-Gen SPARC Systems
Conversations with Oracle Innovators, Rick Hetherington
What's Inside the New SPARC T-Series Processors?
Oracle SPARC roadmap

Sunday, July 3, 2011

Cheap, fast and secure storage

This post if about how I protect important data at home. There are a lot of appliances out there with nice front ends but most of them do not store data on ZFS or something with the same level of data protection. Also, if you have anything but trivial space requirements they tend to get expensive fast and don't have the option to easily enhance performance by for example adding an SSD as file cache or adding other hardware. I've built my ZFS storage server based on an quad core AMD CPU, 8GB of ECC memory, a SAS HBA and a bunch of large SATA disks in hot-swap bays.

All my data now lives in this storage server protected against accidental delete, bit rot, disk failures and fire. I use NFS/CIFS for ordinary file data and iSCSI for my Aperture photo libraries and time machine backups.

All data is stored in one large raidz2 pool, so there are two parity disks allowing any two disks to fail without data loss. Since ZFS checksums all data I know it is intact when read and after bi-weekly data scrubs. The most important reason for using raidz2 is that disks have now become so big that there is now a real risk that there will be an unrecoverable read error during a resilver when all data is read, using another parity disk makes this highly unlikely. Snapshots makes it possible to do a quick rollback if any person or software should damage or remove data, this is also very useful when transforming large amounts of data with uncertain outcome. This will keep data safe from most user errors, disks errors and controller errors but fire and major user errors/sabotage (zfs destroy -r) could still make me loose data.

To avoid the later two scenarios I mark my most important datasets with a flag and a script streams them using ZFS send/receive to external disks over eSATA. The disks are then transferred to a second physical location. I can currently fit all critical data on one large SATA disk which makes this cheap and easy. I exclude ISO images and virtual machine disks that I only use for testing. A full backup of critical data takes about 3 hours today, that depends on the backup disks which can write data at about 80-90MB/s. By using incremental ZFS send the time goes down considerably as only the delta between the snapshots need to be transferred.

To be able to recover individual files and recover parts of data even if the disk have errors the streams are sent to a zpool on the backup disk. By using several disks I have at least one at another location and it also gives me multiple backup versions. I was considering placing encrypted ZFS streams on the disks but it is then not possible to recover individual files and if the stream is damaged it becomes useless.

In an ideal world I would have another node set up that receives the incremental ZFS streams over the net, but that is overkill for my current usage and I have no secondary site with good bandwidth (and another storage server).

This setup gives me the following redundancy:
  • Integrity of all data is verified every two weeks
  • Data has several read-only snapshots from different times
  • Data is protected by two disk parity raidz2
  • Accessed data is always verified by checksums
  • Offsite backups allow disaster recovery
  • Backups are also checksummed
  • Memory is ECC protected to prevent data corruption
This is all fine, but there is still one single point of failure, if a serious error would creep into the ZFS code it could be replicated to all snapshots and pools, but given the amount of testing ZFS has gone trough it seems unlikely. Here tape backups over NDMP would be of good use but since I do not have any tape hardware all important data is copied with rsync to a disk with an old-school filesystem once every other month.

On top of this I also take advantage of other ZFS features, a cheap SSD is used as L2ARC to accelerate various workloads and compression/de-duplication is as always available with ZFS. It is also possible to add new hardware to the setup without buying a different server or license, 10GbE, Fibre channel, more SSD caches and more RAM for cache/dedup can easily be added, that would probably not be possible with a pre-built NAS appliance or at least not as cheap.

I am evaluating the beta of OpenIndiana 151 on the storage server after upgrading from the now dead OpenSolaris distribution (I would not have tested a beta release without all these backups in place), so far everything works fine. Solaris 11 Express can also be used but that requires a license from Oracle that costs about $1000/year but it will give you ZFS crypto and a few other ZFS features not available in the open ZFS code base.

All the technical features are better than most storage appliances but OpenIndiana/Solaris 11 Express provides no web based administration, there are however add-on software such as nap-it available and commercial ZFS software appliances such as NexentaStor which has a free community edition for up to 18TB of used storage.

I have worked with designing and implementing various similar solutions from small office filers to larger data archives with 96-disk. I work part time as a consultant so I am available to assist in similar projects.