Tuesday, January 10. 2012
Brendan Gregg wrote a really interesting article about tracing ZFS: Activity of the ZFS ARC. Really worth a read.
Saturday, December 10. 2011
In the past i wrote quite often about a thing that i call systemic features, when features start to fit together seamlessly in order to create possibilities more than the sum of the features. One of the systemic features is the simulation of the cloud. I don't talk about that thing that most people connect in mind with the word cloud (the grid with a credit card checkout  ), but the cloud-like icon in many architectural diagrams called "Network" or "Internet" that sits between the client and the application that often resembles the "a wonder happens here" box in many architectures.
It's not new: I talked about this mid November at the DOAG conference in Nuremberg. And i've playing around with this at customers an privately for a while now.
Many customers have networks as large and as complex as the internet part of a smaller country perhaps 15 years ago. The interesting question is: How can you test your application for it's resiliency against failures in this cloud shaped icon. How does your application react, when your network is doing its high availability magic.
And interestingly Solaris 11 can help you here. The thoughts behind this are pretty simple.
- A router is a computer that runs an operating environment that is tailormade to do network stuff, but at the end it's a computer with a OS (yeah, i know, hardware offloading makes this a little bit more complex, but at the end it's that way)
- A zone is a virtual operating environment.
- Each zone can have it's own set of routes.
- Each zone can have it's own set of firewall rules.
- Each zone can have it's own set of processes.
- Routing protocols are not more than processes collecting information from the network and configuring the routing table.
- You can install a vast array of dynamic routing protocols on a zone.
- I can have up to 8192 zones (given enough memory)
- In Solaris 11 i can emulate switches (etherstubs)
- I can limit bandwidths in Solaris 11 out-of-the-box with crossbow
When i'm combining all this features i can set up a vast array of zones doing nothing else taking each incoming packet on a interface, routing it on a multitude of ways between each other, and send it out on a outgoing interface. Even when the system in your environment are placed in many separate networks of your network you can still use a system with many networking cards or something called server-on-a-stick (single high-bandwidth connection to a vlan-trunking capable switch and using the switch ports as a fan-out).
So in order to emulate a complex corporate network, all i have to do is configuring a lot of etherstubs, configuring many vnics, replicate the physical bandwidths with the maxbw setting on the vnics, set up a lot of zones, perhaps translate the ACL of the routers into firewall rules for firewall functionality of Solaris, installting the routing daemons and configure it similar to the configuration of the routers (in regard of timeouts and so on).
Now i can test, how my applications react, when the network starts to converge against a new topology because of the failures of some lines. I can test, to which topology my network will converge after an line outage (which is nothing more than a deny-all firewall rule). I can test the impact, when the network converges that way, that my traffic flows over a 2 MBit/s line instead of a 155 MBit/s line. For even more complex failure modes i can even use the htbx driver to introduce additional latencies, packet drop or packet reorderung as shown in this article. In essence you can emulate your complete internal network in a single box and with Zones and Crossbow in Solaris 11 it's so low overhead (at the end it is still just one kernel) that you can really emulate the reality and not a simplyfied view, as you don't have emulate via separate hardware or many independent operating system instances in virtual machines.
At the end you could simply use a single Solaris system, put it between all your test systems and use this solaris system as a emulation device for your corporate network. It's simulating the cloud-shaped icon in your architectural diagrams.
Monday, December 5. 2011
Vor der Acquisition von Sun durch Oracle gab es ja die Sun SE University. Dieses Format gibt es wieder. Vom 13. bis 14. Dezember 2011 findet die erste Oracle PARTNER SE University in Fulda statt. Diese Veranstaltung ist für Partner System Engineers gedacht, von denen ich weiss, das eine Reihe hier mitlesen. Mehr Informationen findet ihr hier
Tuesday, October 4. 2011
There is a nice example of the power of boot environment. Boot environments are something like snapshots of your operating system installation made writeable. As you may already assume, they are based on ZFS snapshots and the clone functionality. This is possible due to the usage of ZFS as the root filesystem.
So: Please don't try this at home. Whey you try it, don't try it on any Solaris 11 Express installation of any value. But don't try it. I don't want to hear any story. that you've deleted your ERP system by accident because you used the wrong terminal window. Leave that to trained professional stunt admins with the right equipment (Solaris 11 Express)
Assume you have a system, configured with all your application, everything is running fine. So you think it would be nice to have something like a freezed state of this situation. No problem. This command will do the trick.
# beadm create rescuenet
# init 6
When you reboot your system you will see it as a new entry in the grub menu.
Okay, but boot into the old environment starting "Oracle Solaris ..." first by selecting it in the grub menu (it should be already selected, or you used beadm activate already. Now i will drop the atomic bomb on your installation.
# rm --no-preserve-root -rf /
Essentially we've just nuked the installation. After a moment the system should just freeze. Reset the system and boot again via grub into the boot environment starting with "Oracle Solaris ...":
Okay ... on a normal system this would send you to the tapes. With Solaris 11: Reset the system. Boot into the boot environment "rescuenet" via selecting it in grub.
Tada! Just creating a boot environment with a single command after a config change may safe your butt later .... and btw ... this even works in zones ... they know the concepts of boot environment,too.
Monday, October 3. 2011
Just a short hint: The What's new document of Solaris 10 Update 9 states, that the support for IPoIB Connected Mode has been added in the release. However you have to search a bit in order for some information how to activate it. The necessary step is documented in the manpage for the ibd driver. Let's assume you have to instances of the ibd driver running (ibd0 and ibd1). In this case you have to change one line at the end of /kernel/drv/ibd.conf file to enable_rc=1,1; and reload the ibd driver respectively reboot the system. After that you ibd devices should show an mtu size of 65520 bytes instead of 2044.
PS: The process for Solaris 11 is better, as you just use dladm for it. However connected mode is the default there anyway. In Solaris 10 unreliable datagram was kept as the default, as one of the rules in Solaris is that you have to opt-in to such changes between updates.
Thursday, July 21. 2011
My colleague Christophe Pauliat - Principal Sales Consultant at Oracle - came up with a really nifty way to migrate his Solaris based notebook from a smaller disk to a larger one. I will copy his mail in verbatim here, because i think it's extremely useful. It somewhat resembles the "workaround" for ZFS resizing, however Christophe does takes this significantly forward and does this for boot disks.
OS: Solaris 11 express 2010.11 + SRU8
Steps:
1) Copy data + OS on the new HDD
a) connection of the new 500 GB HDD as an external USB HDD (using a USB external HDD box)
b) creation of a Solaris 2 partition with fdisk and make it active (bootable)
# fdisk /dev/rdsk/c4t0d0p0
c) with the format command, create a partition s0 with all cylinders except cylinder 0
d) Mirroring the existing ZFS pool (rpool) to the new HDD
# zpool attach -f rpool c1t0d0s0 c4t0d0s0
notes:
- c1t0d0 is the 80 GB HDD (old HDD)
- c4t0d0 is the 500 GB HDD (new HDD)
- the option -f is necessary to bypass the warning "partition 0 overlaps partition 2"
e) wait for the sync to be finished (with zpool status)
f) Install Grub on the new HDD
# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t0d0s0
g) Split the pool rpool by detaching to new HDD to create a new pool
# zpool split rpool rpool2 c4t0d0s0
note: I chose not to detach the old HDD because I wanted it to be usable in case of problem
2) Shutdown OS and laptop, disconnect the USB external HDD and replace the internal 80 GB HDD by the new one
3) Rename the new pool rpool2 to rpool
- Boot on a Solaris 11 Express LiveCD or the network using AI
note: In my case, I used an AI server I had installed before (Solaris 11 express 2010.11 with no SRU)
- zpool import rpool2 rpool to rename the pool
- zpool export rpool to export it so that there is no warning in step 4
4) Boot on the new HDD
- It works just fine, but the pool size is still the size of the old HDD (80 GB)
altough it uses a 500 GB partition (c4t0d0s0)
5) Increase the pool size to use the whole partition
# zpool set autoexpand=on rpool
The autoexpand really does an large amount of the trick. The size of a mirrored pool is always the size of the smallest disk. When you have an 80 GB and a 500 GB disk, the size of the pool is 80 GB. Remove the 80 GB disk. The smallest disk is now 500 GB and the size of the pool is 500 GB now as well, as long as you've activated autoexpand.
Wednesday, July 6. 2011
An overwhelming number of ZFS installations work with just a bunch of disks, perhaps in a JBOD or in the server itself. However there are installations, that use disk arrays with RAID-controllers. Some of those installations are even using a single LUN. I don’t think that this is a good idea (for e.g. because ZFS can just detect corruptions without redundancies, but not repair them) but that’s a different story I don’t want to discuss here.
There is a slight change in the default parameters of ZFS in Update 9. It’s related to the parameter zfs:zfs_vdev_max_pending . This parameter controls, how many I/O requests can be pending per vdev. For example when you have 100 disks visible from your OS with a zfs:zfs_vdev_max_pending of 2, you have 200 request outstanding at maximum. When you have 100 disks hidden behind your storage controller just showing a single LUN, you will have – you will know it – 2 pending requests at maximum.
You may think, that you could increase the queue depth without end, but as usual this is a tradeoff game and not that easy, longer queue depths may increase latency of the commands. Experience showed that certain queue depth delivered the best performance on most installations.
However the installed landscape changes and sometimes you have to adjust things. Exactly this happened a while ago in Opensolaris. And it seems that this change moved into Solaris. The default for zfs:zfs_vdev_max_pending is 10 at the moment. You can check this:
# echo zfs_vdev_max_pending::print | mdb –kw
0xa
#
0xa in decimal is 10.
And this is a wise choice for most implementations out there. But it was different on older versions. I checked it on U7, i asked my twitter/facebook contacts to make quick check on U8 as i was to lazy to install it:
# echo zfs_vdev_max_pending::print | mdb –kw
0x23
#
0x23 in decimal is 35 and 35 was the default up to Update 8 of Solaris 10.
So essentially the queues are less deep than before. For JBODs this is most often a good thing, as each vdev and thus each LUN has its own queue of 10 pending I/Os. For a single LUN hiding many disks sometimes not. So how do you change it back to the old value?
You can change it dynamically:
# echo zfs_vdev_max_pending/W0t35 | mdb –kw
To make this change boot-persistent you have to add a line to /etc/system:
set zfs:zfs_vdev_max_pending = 35
Sometimes even an higher value may be indicated with very large numbers of disks behind your controller forming a single LUN.
How do you know if this decreased queue depth is a problem for you at all? The command iostat will help you:
jmoekamp@hivemind:~$ iostat -xdn
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
6,3 1,9 525,9 31,2 0,1 0,0 16,4 6,0 2 3 c3d0
17,1 1,0 1676,0 8,0 0,2 0,1 11,4 4,8 4 4 c3d1
6,4 1,9 525,8 31,2 0,1 0,0 14,1 4,8 2 2 c4d0
17,1 1,0 1675,9 8,0 0,2 0,1 12,9 4,7 4 4 c4d1
0,0 0,0 0,0 0,0 0,0 0,0 0,0 0,0 0 0 gsdbc
jmoekamp@hivemind:~$
If you see the column actv at or near the number of zfs:zfs_vdev_max_pending, it’s worth a try. Otherwise not.
Thursday, June 23. 2011
At approx. 16:20 the poll had the following outcome:
Thursday, June 2. 2011
The Solaris 10/11 Express Hardware compatibility list has a new home: It's now part of the OTN and available at http://www.oracle.com/webfolder/technetwork/hcl/index.html
Sunday, May 22. 2011
Darren Moffat explained in his blog how to encrypt swap and /var/tmp in Solaris 11 Express.
Wednesday, March 23. 2011
Just got a mail from Stefan Schneider, the developer of the compatibility toolkit, describing what the compatibility tester is really for:
Being the project owner of the tool: Oracle takes extreme care that previous investments of software partners in Solaris 10 will be protected. The tool is spotting fairly extreme corner cases of usage of private or deprecated interfaces and commands. The tool tracks the delta of Solaris 10 (318000 files) to Solaris 11 Express at a command and symbol level. This is about common change management with a strong focus on frame work version creep and end of life of components which are outdated for more than a decade. A few examples and their mitigation are getting discussed here: http://www.scalingbits.com/solaris/compatibility
Thursday, March 17. 2011
There is an interesting page at the Oracle Technology Network summarizing the information about the new packaging of Solaris 11 Express and Solaris 11: "Oracle Solaris 11 Express Package Management with Image Packaging System (IPS)"
Thursday, March 3. 2011
My talk at the cebit yesterday afternoon wasn't that good, for the level i demand from me, it was exceptionally bad. It wasn't my presentation, it was a presentation made by the product manager. It wasn't a bad presentation. To the contrary. However it wasn't my presentation style and i assume everybody aware of my usual style got quite aware that i felt not that well while giving this talk. Okay ... At 15:15 i have the opportunity to hold the talk again in pavillion 36 on the CeBIT fairground and i hope it will be much better than yesterday.
Monday, February 21. 2011
A nice article written by Alan Hargreaves -as well Principal Field Technologist- to explain, why the version number of the Apache delivered with Solaris 10 doesn't automatically indicates that it's vulnerable against the attacks reported against Apache since that version: Now, that being said you may also note after installation that it still identifies as Apache 2.0.63 and you may have concerns about vulnerabilities addressed in 2.0.64 mentioned on the Apache web site.
The way that we maintain Apache on Solaris 10 is not to drop in new releases as they happen, rather we take the fixes mentioned and backport them to our 2.0.63 codebase. Alan, i hope linking to this article reliefs you from some additional calls about this topic
Thursday, February 3. 2011
Stefan Hinker wrote a great document about the secure deployment of LDOMs.
|