Tuesday, December 2, 2008

OpenSolaris 2008.11

OpenSolaris 2008.11 was just released for x86, it is a huge improvement over the previous release. I'm not really a desktop user but I think those features seem to be at least on pair with Ubuntu and other similar environments. It has automatic network management, Songbird, Compiz,Transmission, OpenOffice and a graphical package/update manager among other features.

On top of that it has a unique feature for the end users called "Time slider" which uses continuous ZFS snapshots to browse changes or restore lost data. It also has Netbeans 6.5, SunStudio, Ruby, Python, PHP and Java 1.6 available in the repositories making it an attractive choice for developers who now can utilize DTrace in a modern desktop environment.

This all is built on top of snv101b with ZFS as root filesystem with additional storage features besides ZFS (COMSTAR, iSCSI, CIFS) and virtualization (xVM, Solaris zones, VirtualBox).

The packaging system(IPS) has been long overdue, with OpenSolaris we now have network based repositories from which software and updates can be installed. You don't need to download all those packages yourself and resolve dependencies.

Take a peak here before downloading it here.

Perhaps it sounds like I'm working for marketing at Sun, but it's all that good ;)

Thursday, November 6, 2008

SYSV Packages fundamentals

If you are new to Solaris or never bothered looking at how the SYSV packages are built up, i found a good article at bigadmin that could be worth a read:

Introduction to Package Components for the Solaris OS

I found this article while researching why a file was not updated when patching and upgrading a system. It turned out that the file is flagged as a editable file, which means it will not be overwritten by the package system. Such files should be handled by a class action script that are responsible for making the required changes to the file.

In this particular case we had problems with OPL servers (M5000,M8000) that did not have a working console login. We had installed them with a flash archive created with a Solaris release predating OPL support, so the /etc/ou.ap used by autopush was not updated with information required for OPL hardware.

Monday, November 3, 2008

Follow the white rabbit

Yesterday i encountered a strange problem which was very entertaining to debug and dtrace came to the rescue as as always. I had done a fresh install of Solaris Nevada build 101 on a AMD64 machine at home. The installation went fine and i started to configure on the host, but after a while i begun to get error messages like: "couldn't set locale correctly". I checked the SYSV package containing my locale, en_US.8859-1:

# pkgchk SUNWlang-en-extra
ERROR: /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3
pathname does not exist

The installation logs showed that the package had been installed correctly. I tried to reinstall the package which temporarily fixed the problem but after 10 minutes or so it the file was missing again. I repeated this twice to see if there was any time pattern, but it gave me no clue to why the files was removed. I decided to use both BSM auditing and dtrace to see what might be causing the removal of the file. I waited and waited, but the file stayed there so i started to use the system as ordinary, that somehow triggered the removal of the file!?

A dtrace script created by Chris Gerhard showed me what caused the removal of the file:

# ./delete.d
dtrace: script './trace.d' matched 2 probes
0 75683 unlink:return man prctl
UID 0 PPID 11244 sh

Audit also logged the unlink of the file, following the trail it could have shoved me that man was responsible:

header,242,2,unlink(2),,ollespappa,2008-11-03 02:52:01.096 +01:00
subject,arne,root,root,root,root,15984,2799982011,1330 5632 xx.xx.xx.xx

So, man(1) unlinked the file, that did not make any sense at all. A dtrace one liner gave me more information:

# dtrace -n 'pid$target:libc:unlink:entry { printf("unlink(%s)\n",copyinstr(arg0)); ustack(); }' -c 'man ls'
1 76165 unlink:entry unlink(/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3)

The unlink was called from the format function of man, Looking at the source I identified which line was responsible for the unlink:
2754 (void) unlink(tmpdir);

After some more investigation I realized that the argument string to the unlink command might not get initialized, it then points out in space. For this system it pointed at a string which represented the path to a file which could be unlinked. This occurs only if the manual page is in another format than SGML since the variables are initialized inside an SGML-related if-statement :

2612 if (sgml_flag == 1) {
2613 if (check_flag == 0) {
2614 strcpy(tmpdir, "/tmp/sman_XXXXXX");

This bug seems to have been in Solaris for ages, but the uninitialized string might never have come to point to anything useful to unlink, it could perhaps be the switch to SunStudio 12 in snv100 that triggered this behavior.

This is now CR 6767074.

Sunday, November 2, 2008

Update 6 released and attach/detach (updated)

After a long wait update 6 of Solaris was released a few days ago and is available for download. All new features is listed here, as i have mentioned earlier the bulk of the changes are in the major ZFS update, including ZFS boot. This is a great update to a already good OS, but i was extremely disappointed to see that they have not fixed the attach/dettach bug related to patch history yet, it have been waiting for this fix for well over a year. In this update bugid 6637869 (zone attach doesn't handle obsolete patches correctly) is fixed but a new one is listed in known issues, bugid 6710545 (fix for 6550154 doesn't always ignore older patches).

What this means is that it is still not enough to be on the exact same patch level on both machines when moving zones in a supported way, all patches have to be applied in the same way, have the same history. Otherwise it might not work. I don't know what update on attach can do for this, but that is not the point, attach is still broken.

Update, Gerald kindly informed me that this is indeed fixed in Solaris 10 10/08 (137137-09) But it is stated as an issue in the release notes, which i based this post on.

Thursday, October 23, 2008

Too tight diamond shoes

I've encountered a terrible problem today, one of our hosts had more resources that it should, fortunate it was a T5120 so it was easy to disable hardware from the SP:

For (strand=P63;strand >= P32, strand ):
Remove half of all the strands 32 ( 4 cores ), P64 down to P32:
-> SYS /SYS/MB/CMP0/P63 component_state=Disabled

Remove half of the memory, disable memory branch BR3 and BR2:
-> set /SYS/MB/CMP0/BR3/CH0/D0 component_state=Disabled
-> set /SYS/MB/CMP0/BR3/CH0/D1 component_state=Disabled

Reset the system to apply the changes:
-> reset /SYS

Tuesday, September 16, 2008

Sun is readying Solaris 10/08

The release of Solaris 10/08 is getting close, the sun.com website has some updates to what new features we can expect in this release. I have mentioned some of them earlier, but this is what Sun want to highlight:

xVM guest support
Major ZFS update including ZFS boot
New HW support including Intel Xeon 7400
LDOM enhancements (MPxIO, dynamic I/O reconfiguration)
X64 Fault management enhancements
NVIDIA SATA enhancements

I sure hopes bugid 6637869 is finally fixed in this release so migrating zones will be practical in a supported way. (6637869 zone attach doesn't handle obsolete patches correctly)

This release will be a great leap forward for patching and upgrading when used together with ZFS snapshots and live upgrade. To always be able to roll back changes in a matter of minutes when upgrading or patching without adding extra disk and copying data for hours will be of tremendous value in an production environment. This update also address the need to be able to place Solaris Zones on a ZFS filesystem and still be able to upgrade and patch without restrictions.

I'll post an update when i get my hands on this new release mid October.

Thursday, August 7, 2008

iPhone software update

Apple has released a update to the iPhone OS, 2.0.1, it seems to have fixed most of the problems i have had with the device. It have not crashed in two days now, and applications also work much better. The only problem so far is Safari which has gone down about once a day. Hopefully they also update iTunes soon, there are some very irritating problems with syncing and applications updates. iTunes wants to update all of my applications for no reason and there are several instances of the same applications which gets downloaded.

Too bad the sync process takes such a long time to complete with 2.0, it could probably have been at least a bit better if the had not abandoned firewire for the iPhone, it is much faster than USB 2.0 in data transfer.

But overall i am am happy, i now have a device i can depend on again, it's a nightmare to be on vacation with a flaky smartphone;)

Sunday, August 3, 2008

Software releases and iPhone

We all know how irritating it can be when software is not released on schedule, but the alternative is often way worse. I bought a 3G iPhone on launch day and since then i have grown more and more frustrated with the quality of the new 2.0 OS. Application crashes, reboots and slowdowns, its not was i was expecting from Apple. And it has now been over three weeks without any updates or reliefs, lets hope the next update will be worth the wait. Today i use my iPhone several hours a day, and i get at least one reboot, several crashes and a few slowdowns(solved by a reboot in windows style) every day. I love the devices and all the features but this is not acceptable for a device that you depend so much on.

When all this is solved they only need to integrate some standard features that all other smart phones has such as MMS, bluetooth file transfer, cut&paste and the ability to send contacts to other phones.

Tuesday, June 24, 2008

Solaris 10 Update 6 wishlist

With only about two month left (lets say three) to the next update of Solaris 10, it would be interesting to know which changes will be incorporated in this release. All i have seen so far is that it will contain a large ZFS update including boot. I have made a little wish list with a few needed enhancements that probably could be part of it:

Zone update on attach (PSARC/2007/621)
Attach/dettach between sun4u and sun4v (6480464)

This will make life a lot easier for us that have deployed large amounts of zones, making it easier to move and make incremental updates to zones. Today a zone must be upgraded/patched together with a global zone, with update on attach you could have a move zones to a already upgraded node and attach them, one by one, thereby minimizing total downtime per zone. This can be quite handy in a environment with 20+ zones per node (live upgrade is not always a option).

This one i am not so sure will make it, but we can always hope for the best and prepare for the worst ;)

Network Vanity naming (PSARC 2006/499)

Vanity naming would be great for moving detached zones to a new host, today you will have to change the zone configuration if the new host have another NIC or the NIC configured differently. With vanity naming the zone could just bind to the names public and backup instead of e.g. bge3 and bge8.

And a long standing bug fix would also be greatly appreciated:

6637869 zone attach doesn't handle obsolete patches correctly

I've had problems with this bug when moving zones from a upgraded Solaris 10U4 to a freshly installed S10U4, a clean attach will currently not work.

Wednesday, June 4, 2008

I used to think that the day would never come

SNV90 with ZFS boot is finally here, i've already installed it on a x64-system and tomorrow its is time for my SB1000 to get a new filesystem for root.

Some screenshots from the x64 installation: 1 2 3 4
And here is my SPARC workstation cleanly installed:
$ uname -a
SunOS precursor 5.11 snv_90 sun4u sparc SUNW,Sun-Blade-1000
$ zfs list
rpool 10.9G 56.1G 64K /rpool
rpool/ROOT 5.87G 56.1G 18K legacy
rpool/ROOT/snv_90 5.87G 56.1G 5.87G /
rpool/dump 2.00G 56.1G 2.00G -
rpool/export 38K 56.1G 20K /export
rpool/export/home 18K 56.1G 18K /export/home
rpool/swap 2G 58.1G 2.68M -

This is now a full ZFS boot configuration unlike the one previously available in OpenSolaris 2008.05, where swap is still located on a separate slice. Now ZFS should be able to enable write cache since the whole disk is used for ZFS. The disk configuration is also much cleaner and simpler. Another nice thing is that you are able to change the size of the swap as easy as `zfs set volsize=96G' rpool/swap` albeit with the need for a reboot or possible add and remove of the swap device before it is effective.

I tested to create a live upgrade boot environment, one command and a few seconds later i had a alternate boot environment ready for upgrades, patches or other changes. lucreate did a snapshot of my current root filesystem and created a entry in GRUB. So besides the installation parts live upgrade also fully understands ZFS filesystems now.
I used to think that the day would never come
I'd see delight in the shade of the morning sun
My morning sun is the drug that brings me near
To the childhood I lost, replaced by fear
I used to think that the day would never come
That my life would depend on the morning sun...
True Fait - New Order

Monday, June 2, 2008

ZFS boot!

Looks like ZFS boot is finally coming to Solaris 10, and with that wonderful, secure upgrades and patching. The zfs root support have been available for some time for Solaris Nevada [X86|X64|IPE] as already seen in use in project Indiana (OpenSolaris 2008.05). It is now soaking into Nevada for Sparc also, the last ZFS boot parts was integrated in SNV88 and the support in the non-opensource parts (installer) is supposed to be available in SNV90, which is due in a week or so.

Even better, it looks like Solars 10 Update 6 will contain this functionality, according to a Sun presentation:

ZFS Root (S10 Update 6)
● Brings all the ZFS goodness to /
● Checksums, compression, replication, snapshots and clones
● Boot from any dataset
● Patching becomes safe
● Take snapshot, apply patch... rollback if you don't like it
● Live upgrade becomes fast
● Create clone (instant), upgrade, boot from clone
● No “extra partition”
● 10U6:Based on new Solaris boot arch.
● ZFS can easily create multiple boot environments
● GRUB can easily manage them

The whole presentation is here.

I cant wait to to apply my first set of patches, or perform a upgrade on a ZFS snapshot. Or why not daily snapshot of / and perhaps before you install that 3rd party software....

Easy, cheap, fast and safe modifications to the OS and boot environment!

Thursday, May 15, 2008

Implementation of Zones with ZFS

While i work with zones in production environments all days, they don't reside on ZFS yet for support reasons(upgrade). But for my Ultra 2 i have created a zone configuration based on ZFS, zones and resource controls. Using ZFS i created a simple but very useful setup of zones placed on ZFS.

I divided the server into functional parts: web, mail and users, each has its own separate zone.
storage/zones/mailzone                           758M  20.5G   717M  /zones/mailzone
storage/zones/mailzone@stable20080502 41.0M - 693M -
storage/zones/userzone 167M 20.5G 128M /zones/userezone
storage/zones/userzone@stable20080502 39.2M - 166M -
storage/zones/webzone 2.36G 20.5G 519M /zones/webzone
storage/zones/webzone@stable20080502 8.03M - 517M -
storage/zones/webzone/webcontent 1.85G 20.5G 1.85G legacy
storage/zones/webzone/webcontent@stable20080515 0 - 1.85G -

I have created a stable snapshot for all filesystems, if anything should happen to any of the zones or their data, i can quickly rollback to a known state in a few seconds. Also none of the snapshots are available from within the local zones, hence the legacy mount of webcontent.

On top of this i restricted the maximum addressable amount of memory for each zone to a few hundred megabytes. This is done the with zone.max-swap resource control. I also used Fair Share Scheduling and dedicated 100 shares to the global zone and 10 to each of the other zones.

I also disabled unneeded services in the local zones, it can save a few hundred MB of memory for each zone. Disabling svc:/system/webconsole:console saved about 175MB per zone.

I found this setup very useful for internet connected servers, create one zone per service and only have that service activated in the local zone. Secure the global zone and only use it for administration of the local zones.