Wednesday, November 28, 2012

LDOM 3.0

Oracle have release Oracle VM for SPARC 3.0 (LDOM).
  • Enhances the resource affinity capability to address memory affinity.
  • Retrieves a service processor (SP) configuration from the SP into the bootset, which is on the control domain.
  • Enhances the domain shutdown process. This feature enables you to specify whether to perform a full shutdown (default), a quick kernel stop, or force a shutdown.
  • Adds Oracle Solaris 11 support for Oracle VM Server for SPARC Management Information Base (MIB).
  • Enables a migration process to be initiated without specifying a password on the target system.
  • Enables the live migration feature while the source machine, target machine, or both, have the power management (PM) elastic policy in effect.
  • Enables the dynamic resource management (DRM) feature while the host machine has the PM elastic policy in effect.

I hope this release removes the restriction that prevented dynamic reconfiguration of resources after a live migration.
This release also seems to have been tested on upcoming SPARC processors: "7159011 M4/T5: Migration fails initialization on Logical Domains Manager startup" including Fujitsu Athena: "RFE: Cross CPU Migration support between Athena server and T series".

Oracle VM Server for SPARC Documentation

Friday, October 26, 2012

Solaris 11.1 available for download

Solaris 11.1 is now available for download from Oracle.

The usual SPARC/X86 [text|live|usb|AI] images are available as well as repository images. There are also pre-upgrade repository images available for those of you who are upgrading from 11/11 and have not upgraded to a recent SR or do not have a support contact.

Oracle Solaris 11 Download

Wednesday, October 10, 2012

Solaris 11.1 announced

Solaris 11.1 have been announced and will be released later this month. It is the first update of Solaris 11 since it release november last year. It contains a few interesting features, I've only list a few, over 200 projects have integrated into this release.

  • New Virtual memory subsystem (VM2.0 or parts of it)
    Scales beyond 100TB, predicts and adapts to page demand, higher performance
  • RSyslog
  • USB3 support
  • Install on UEFI/4K
  • Interactive install on iSCSI
  • FedFS
  • Parallel zone update
  • Must faster LOFI
  • FS statistics per zone
  • Physical to virtual Solaris 10 migration
  • Better support for shared storage
  • Remote Administration Daemon
    Secure zone administration with C, Java, Python API
  • VNIC config in zone XML
  • Faster install and attach
  • ASLR
  • OpenSCAP Security Compliance Checking and reporting tool
  • Audit remote server
  • Data Center Bridging (DCB) IEEE902.1Qaz
  • Link aggregation span across switches
  • VNIC migration
  • Edge Virtual Bridning (EVB) support
  • High performance SSH
  • GRUB2
  • UEFI support
  • Improved hardware support
I asked the Solaris panel on Oracle Openworld about memory set for zones, it is not part of this release but might be implemented now that the initial part of VM2.0 is implemented.

Solaris 11.1 Whats's new

Tuesday, October 9, 2012

SPARC T5, M4 and SPARC64-X

Short summary of SPARC processor information that was disclosed at Oracle world, in the near future Oracle will release two different SPARC processors and Fujitsu will release a new SPARC64 processor with support for LDOM.

SPARC T5 (early next year)
  • 16 Cores 128 threads 
  • 28nm 
  • 25% increased thread performance to T4 
  • 2.5x throughput compared to T4 
  • Scales from 1 to 8 processors 
  • PCIe Gen 3 
  • 8MB L3 cache
  • LDOM virtualization (as with all previous T-series)
  • Solaris 10 update 11 or Solaris 11.1
SPARC M4 (next year)
  • 6 cores 48 threads 
  • Scales to 32+ sockets 
  • 48MB L2 cache 
  • 28nm 
  • 3.6GHz 
  • 5-6x performance per socket compared to M-series 
  • LDOM virtualization
  • 32TB+ memory configurations
  • Solaris 11 only (but S10 support in LDOM)
  • LDOM virtualization 
  • 16 cores 32 threads 
  • 24MB L2 Cache 
  • 3 GHz 
  • On Chip  DB floating point 
  • Crypto acceleration 
  • Runs both S10 and S11 in lab

Wednesday, October 3, 2012

Oaktable world and OpenWorld 2012

I am attending both Oracle OpenWorld an ZFS Day/Oaktable world and will post updates as soon as I get some spare time or at the latest early next week.

It was great to talk and listen to the Joyent/Nexenta/illumos guys at Oaktable world.

Oaktable world
Oracle OpenWorld

Thursday, September 13, 2012

Oracle forces wesunsolve to close

In another blow against the community and the people who work with or/and have interest in their products Oracle now has forced to close.

In the last two years it has been of tremendous value for administrators who find their support site hard to navigate and the a good overview of patches and updates.

I see nothing Oracle can gain by doing this, no patches where available only metadata with links to their own support site for downloads ( if you have an account with access to the patches ).

Since they already done a much, much worse thing by closing the OpenSolaris source this comes as no surprise. The way companies treat innovation and community efforts are important when you chose a operating system or database engine. Tell your sales representative what you think about these things.

A huge thanks to the people who put time and effort into making, it was very useful and will be missed.

Sunday, August 19, 2012

Solaris 10 10/12 and T5-8

Oracle keeps quiet about upcoming Solaris release and their features. I've however figured out the name (and planned release month) for the possibly last Solaris 10 update. It seems like s10u11 will be named Solaris 10 10/12 indicating a October release.

Not much is known about this release more than that it will probably feature a ZFS tech refresh, fully integrate OCM and live upgrade enhancements. It could possibly also support the new T5 processors since they might be quite similar to the current T4 processors except for the doubling of cores.

I have also seen fragments of information indicating that Oracle was running Solaris 11 Update 1 on SPARC T5-8 machines as early as March.

Friday, August 17, 2012

Upcoming SPARC CPUs

The upcoming Hot Chips symposiums "Big iron" session will feature two future SPARC processors:

"SPARC64 X; Fujitsu’s new generation 16 core processor for the next generation UNIX servers

16-core SPARC T5 CMT Processor with glueless 1-hop scaling to 8-sockets"

The SPARC T5 is expected to be built using 28nm technology and double the number of cores compared to the current T4 processor. The Sun Oracle server line should also include a 8 processor version, T5-8 which will then be have four times the number of cores (128) compared to the current T4-4 (32).

This session will be held August 29, hopefully more information will surface afterwards. Otherwise it would be a safe bet to say that we will know more about the SPARC T5 after Oracle OpenWorld in October.

The Register has an article about both the T5 and the M4: Drilling into Oracle's performance boasts.

Saturday, July 28, 2012

Joyent presentations @ FISL 13

Joyent had a several speakers at the FISL 13 conference and presentations/slides are now available online.

Bryan Cantrill, Corporate Open Source Anti-Patterns: Doing It Wrong
video, slides
Bryan speaks his mind about corporate open source patterns with insights from Sun, the OpenSolaris project and Joyent. He does a bad job hiding what he thinks of Oracle ;)

Brendan Gregg, Performance analysis, the USE method
slides, video
Brendan on performance analysis using the USE method with good examples.

Update: Added Brendans video.

Friday, July 20, 2012

Lots of packages for SmartOS, soon for OpenIndiana

There is now a huge package repository available for illumos-based distributions, initially a dependency prevents it from running on OpenIndiana but that is being fixed:

9000 packages available for SmartOS and illumos

The packages contains a current PostgresSQL (9.1.3), MySQL, Apache, Ruby 1.9.3, Python 3.2.3 both with lots of modules plus many other useful packages.

All should work on SmartOS and when fixed for OpenIndiana this slightly modified procedure (without sudo and install gtar first) should work, as root:

# pkg install gnu-tar
# curl | (cd /; gtar -zxpf - )
# pkgin -y update
# pkgin avail | wc -l
# pkgin search ...
# sudo pkgin -y install
I'll update this entry as soon as it works for OpenIndiana.

Good summary of enhancements in illumos ZFS

I found a good summary of enhancements to the free ZFS implementation in illumos:New features in open source ZFS

Also well worth a read is is Matt Ahrens post about the performance of the new async destroy: Performance of zfs destroy.

Thursday, July 5, 2012

OpenIndiana updated (oi_151a5)

A new pre-stable release of OpenIndiana was released a few days ago (oi_151a5), the fifth since the initial illumos-based oi_151a development release in September. Besides bugfixes and minor enhancements the this new release also includes a refresh of the illumos code base which includes a few new noticeable features:

  • ZFS feature flags 
  • ASynchronous destruction of ZFS file systems 
  • ZFS send progress output

There have also been quite a few userland updates, all is documented in the release notes including a list of CVE-fixes:

OI_151a_prestable5 Release Notes

Update or download images here.

Monday, May 28, 2012

LDOM 2.2 released

A new relase of LDOM, currently known as Oracle VM for SPARC has been released. One of the major features of the new release is the ability to do live migration between SPARC T2,T3 and T4 processors. Enabling this features does however have some performance impact:

From ldm(1M):

Specifies one of the following values:

generic uses common CPU hardware features to enable a guest domain to perform a CPU-type-independent migration.

native uses CPU-specific hardware features to enable a guest domain to migrate only between platforms that have the same CPU type. native is the default value.

Using the generic value might result in reduced performance compared to the native value. This occurs because the guest domain does not use some features that are only present in newer CPU types. By not using these features, the generic setting enables the flexibility of migrating the domain between systems that use newer and older CPU types."

Another major feature is SR-IOV support, which can enable bare metal I/O performance for logical domains, read more here: SR-IOV feature in OVM Server for SPARC 2.2

There a new set of patches available to update the system firmware to 8.2.0 which is needed for the new features.

Announcing Oracle VM Server for SPARC 2.2 Release

Sunday, May 27, 2012

illumian available for download

A new illumos based distribution is available for download, illumian. illumian is the based on the APT packaging system and is a successor to Nexenta Core Platform (NCP) which was built by nexenta using the OpenSolaris source and APT packages.

The next version of NexentaStore (4.0) should also be built upon illumian, previous versions was built on NCP.

There is currently one image available for server text install on X86 hosts.

Thursday, May 24, 2012

ZFS feaure flags and async destroy

The first features unique to the open ZFS implementation have been integrated into illumos. As discussed earlier it is feature flags and async destroy of datasets.

illumos gate
ZFS feature flags update

Tuesday, May 22, 2012

Solaris 11 / SPARC News

Here is a good summary of a recent online forum, "Solaris 11: What's new since launch?"

Solaris 11 Update 1 (late this year)
  • Updated Virtual memory subsystem
  • This is probably what has been known as vm 2.0 earlier
  •  Faster Solaris 11 updates with improved python performance 
  •  Already running on the upcoming T5/M4 SPARC(R) chips
  • VNIC configuration switch hosts with their zones
There are also hints on what future Solaris 11/hardware updates might bring
  • Hotpatching similar to KSplice (Remember DUKS in Solaris 8?)
  • Offloading of compression and Oracle arithmetics to CPU besides crypto
  • Schedulers for DB or JVM workloads
Summary: What's new with Solaris 11 since the launch?

Wednesday, May 2, 2012

ZFS feature flags update

ZFS feature flags have been mentioned earlier and now the code is now available from Delphix so that it can be integrated into illumos. With this in place new ZFS features can be implemented in a clean and compatible way, first out seems to be async destroy of datasets (feature flag com.delphix:async_destroy).

Hopefully we will see other new feature soon after this is in in place.

ZFS Feature Flags Presentation (PDF)
Feature flags webrev

Sunday, April 15, 2012


OmniOS is a new illumos-based server distribution with commercial support available was announced at the DTrace conference.

It contains the features you expect like Crossbow, ZFS, DTrace, IPS and Comstar but also includes KVM and updates in userland (Python, GCC, Perl, OpenSSL etc.)

"OmniOS is our vision of what OpenSolaris could have been had it remained in the open. It runs better, faster and has more innovations,” continued Schlossnagle. “OmniTI did not want to lose the benefits that OpenSolaris technologies brought to customers, so we decided to pursue the continuation of the OS on our own. We've been running OmniOS in our data centers for six months and have seen tremendous results. We’re excited to announce our news at the DTrace conference because of its importance and relevance to this community."
- Theo Schlossnagle, CEO of OmniTI

More information, install images and source repositories are available here:

I have only installed the image into VirtualBox witch was painless and quick, I might post an update when I've had time for some exploring.

OmniTI Debuts OmniOS, an Open Source Operating System for the Solaris Community

Tuesday, February 21, 2012

S11 and S10 inside LDOM 2.1 on T4

I've finally managed to get some time to play with live migration on a pair of SPARC T4-2. This post is not really adding any new information but is a walk-trough and initial reflections. I am going to continue to write LDOM instead of Oracle VM for SPARC Domains or something like that, even Oracle people still say LDOM and everyone else knows what is.

An interesting note is that I've used Solaris 10 as I/O and Control domain for the T4 servers while the LDOM is installed with Solaris 11 11/11. The disks for the LDOM are on LUNs over FC and MPxIO is used for multipathing from the I/O domain:

t42-01# dskinfo list-long
disk size lun use p spd type lb
c0t5000CBA015B85D98d0 279G - rpool - - disk y
c0t5000CBA015B93B90d0 279G - - - - disk y
c0t50002870000254901593534030832420d0 33G 0x0 - 4 4Gb fc y
c0t50002870000254901593534030832420d0 33G 0x1 - 4 4Gb fc y
Examples of migrating and reconfiguring the LDOM while running:
t42-01# ldm list
primary active -n-cv- UART 16 16G 0.1% 12d 6h 37m
ldms11-01 active -n---- 5000 16 8G 0.0% 24m

t42-02# ldm list
primary active -n-cv- UART 16 16G 0.1% 12d 1h 26m

ldms11-01:~$ uptime
5:11pm up 19 min(s), 1 user, load average: 0.00, 0.00, 0.01
henrikj@ldms11-01:~$ prtconf -v |grep Mem
Memory size: 8192 Megabytes
henrikj@ldms11-01:~$ psrinfo | wc -l

t42-02# ldm set-vcpu 96 ldms11-01
t42-02# ldm set-memory 200G ldms11-01

t42-02# ldm list
primary active -n-cv- UART 16 16G 0.1% 12d 6h 50m
ldms11-01 active -n---- 5000 96 200G 0.1% 24m

ldms11-01:~$ prtconf -v |grep Mem
Memory size: 204800 Megabytes
ldms11-01:~$ psrinfo | wc -l
When performing a live migration between the two hosts, running processes and open network connections are as expected intact, there is only a small delay in the network traffic visible. For my initial tests the delay was about 10 ms.

The live migration seems to work very well and the T4 seems to perform several times faster than the T2/T3 for general workloads. The only thing missing is that LDOM 2.1 is unable to dynamically reconfigure memory and CPU resources for a domain after migration. A reboot is then required, hopefully this will be fixed in the 3.0 release, which people at Oracle Open World said would be focused on removing current limitations (including migration between different types of sun4v processors).

Tuesday, January 3, 2012

The all-seeing eye of DTrace

I was recently involved with a problem related to backup software running on Solaris, as part of a general health check of the system I stumbled on something interesting that was not visible using conventional tools.

This tuned out to be a good opportunity to put my DTrace skill to work together with a few finished scripts. Once again it struck what how amazing this tool is, you can really see everything that is going on in your system and as it turns out, you can even see problems that does not even exist. Since this was so much fun and a good example I will walk through the steps again:

The thing that caught my eye was the output from errinfo of the DTrace toolkit. There are a very high rate of system calls returning in error, namely close() with -9 "Bad file number", as seen with errinfo:
whoami        ioctl   22     13  Invalid argument                 
init ioctl 25 212 Inappropriate ioctl for device
awk stat64 2 520 No such file or directory
java lwp_cond_wait 62 3492 timer expired
processx close 9 102073391 Bad file number
Syscall errors are in itself normal can be seen on any systems, but usually not several thousand per second. As the error message indicates this happens when a close() is issued on a file handle (Integer) that does not represent an open file for that process, which at first look seems like a quite useless operation.

We can also see that close() is by far the most used system call here:
# dtrace -q -n 'BEGIN { close=0;total=0 } syscall::close:entry \
{ close = close + 1 } syscall:::entry { total = total + 1 } END \
{ printf("%d close calls of %d total calls\n",close,total) }'

309530 close calls of 426212 total calls
Looking at which file descriptor the process is trying to close shows that there is an even distribution of close between 0 and 65536 and the only successful calls where to numbers lower than 1024 where numbers normally used unless a process has a very high amount of open files.
# dtrace -n 'syscall::close:entry { this->fd = arg0 } syscall::close:return \
/ errno!= 0 / { @failed = lquantize(this->fd,0,65536,16384) } syscall::close:return \
/errno == 0/ { @good = lquantize(this->fd,0,65535,1024) }
dtrace: description 'syscall::close:entry ' matched 3 probes
value ------------- Distribution ------------- count
< 0 | 7
0 |@@@@@@@@@@@@@ 414811
16384 |@@@@@@@@@ 294912
32768 |@@@@@@@@@ 294912
49152 |@@@@@@@@@ 294912
>= 65536 | 0

value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12459
1024 | 0
The processes responsible only lives only a short while, but by using dtruss i could trace system calls based on the process name:
 15889/1:  fork1()   = 0 0
15889/1: lwp_sigmask(0x3, 0x0, 0x0) = 0xFFFF 0
11385/1: getpid(0x0, 0x1, 0x1CD0) = 15889 0
11385/1: lwp_self(0x0, 0x1, 0x40) = 1 0
11385/1: lwp_sigmask(0x3, 0x0, 0x0) = 0xFFFF 0
11385/1: fcntl(0xA, 0x9, 0x0) = 0 0
11385/1: schedctl(0xFFFFFFFF7F7361B8, 0xFFFFFFFF7F738D60, 0x11A340)
= 2139062272 0
11385/1: lwp_sigmask(0x3, 0x0, 0x0) = 0xFFFF 0
11385/1: sigaction(0x12, 0xFFFFFFFF7FFDEE20, 0x0) = 0 0
11385/1: sigaction(0x2, 0xFFFFFFFF7FFDEE20, 0x0) = 0 0
11385/1: sigaction(0xF, 0xFFFFFFFF7FFDEE20, 0x0) = 0 0
11385/1: getrlimit(0x5, 0xFFFFFFFF7FFDED90, 0x0) = 0 0
11385/1: close(0x3) = 0 0
11385/1: close(0x4) = 0 0
11385/1: close(0x5) = 0 0
11385/1: close(0x6) = 0 0
11397/1: close(0xFFFF) = -1 Err#9
The process is issuing close on all numbers between 0x3 to 0xFFFF in a loop, as expected the first few are actually open and closed correctly but the other was majority is returning error -9.

If we look in the beginning of the trace we can see a fork, followed a little later by getrlimit(0x5,...), if we look at what that arguments to getrlimit means:
# egrep "RLIMIT.*5" /usr/include/sys/resource.h
#define RLIMIT_NOFILE 5 /* file descriptors */
The process is checking the limit of file descriptors and then closes the whole possible range which seems a little unnecessary since almost none of them are open. But this was just after a fork, and a forked process inherits all the open files of it's parent, this might not be what you want so a close is in order. There are however no easy way of getting a list of all used file descriptors so what we see here is a brute-force approach of making sure none are open before continuing. This would probably not have been noticed if it weren't for the unusual high limit of file descriptors.
# plimit $(pgrep processx|head -1) | grep nofiles
nofiles(descriptors) 65536 65536
Perhaps a iteration with close on the contents of /proc/${PID}/fd would have been less resource consuming in this scenario.

All of this was done in a production system without impact to applications which is crucial, you must be able to trust that it will never bring your system down. This is something DTrace can be trusted with where some platforms lacking it but tries to provide somewhat similar observability fails, read Brendans blog: using systemtap or the older but entertaining DTrace knockoffs.

Download the DTracetoolkit here.

ZFSSA/S7000 major update

The first major software update of S7000/ZFSSA/Fishwork in over a year is now available. With the original version "2011.Q1" it seems a bit delayed, perhaps due to the departure several key persons behind the software post Oracle acquisition of Sun. New features in this release:
  • ZFS RAIDZ read performance improvements
  • Significant fairness improvements during ZFS resilver operations
  • Significant zpool import speed improvements
  • Replication enhancements - including self-replication
  • Seval more including bug fixes.
There are alsoa few integration features for Oracle including RMAN and Infiniband if you happen to have a Exadata around.

The improved RAIDZ performances is the hybrid raidz/mirror allocator in zpool version 29.

The ZFSSA is a fantastic product with probably the best interface and analytics available. But the development seems to have stagnated a bit the last year, so have the blog post with useful information and performance comparison by the people behind it. And I still miss one feature badly; synchronous replication of datasets, continuous replication is not always good enough.

Release Notes