Talk:Single system image

SGI

how about adding SGI to the list?

http://www.sgi.com/company_info/newsroom/press_releases/2004/march/large_scale.html http://www.sgi.com/products/software/starp/

Ericfluger 19:49, 10 March 2007 (UTC)[reply]

Proposed rewrite

Since everyone is clear that this article is a mess I'm starting a proposed rewrite at /Rewrite. In the interests of full disclosure I acknowledge that I'm a OpenSSI developer. If anyone thinks I'm giving undue weight to OpenSSI please note it here HughesJohn (talk) 20:49, 25 September 2008 (UTC)[reply]

From SINGLE SYSTEM IMAGE (SSI):

A single system image (SSI) is the property of a system That hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource.

Not very intelligible HughesJohn (talk) 10:27, 26 September 2008 (UTC)[reply]

An exciting new idea, assuming that my proposed categories for SSI support are acceptable how about doing a feature matrix? HughesJohn (talk) 19:28, 26 September 2008 (UTC)[reply]

Ok, feature matrix is now a feature. HughesJohn (talk) 13:09, 2 October 2008 (UTC)[reply]

Be Bold

Well, it's time to be WP:BOLD, so I'm going to put my new article in place. Let's see what fireworks this produces :-) HughesJohn (talk) 13:11, 2 October 2008 (UTC)[reply]

Lead section

Keep in mind: "The lead section, lead, or introduction of a Wikipedia article is the section before the table of contents and first heading. The lead serves both as an introduction to the article below and as a short, independent summary of the important aspects of the article's topic." HughesJohn (talk) 13:08, 2 October 2008 (UTC)[reply]

Other Aspects

I have filled in the OpenVMS entries in the table.

I would like to discuss/suggest/add some additional aspects for SSI clusters:

A Distributed Lock Manager

Locking resources is cluster wide. Locks survive a node being removed from the cluster, regardless of whether that node was the master of the lock, or had some interest in the lock. Note that is some SSI system such as OpenVMS, the DLM is one of (if not the) fastest means of cluster communication.

A single security model

All security mechanisms are cluster wide. A single set of /etc/passwd or SYSUAF and related files are used.

Kernel integrated

The cluster software is directly integrated into the kernel. A standalone node is effectively a cluster of one node. Turning cluster software on or off is a configuration option not a re-install/rebuild. System services are mainly cluster centric. Cluster membership is present before normal file system and user access is possible. This also means before most daemons are possible.

Cluster communication versus IPC

Much if not most of the communication within the cluster is not specifically IPC. This is particularly true if the cluster software is fully integrated with the kernel, in which case most of the traffic is kernel to kernel, not specifically process to process. Thus we might use spinlocks for inter processor intra node coordination and communication, distributed lock manager for inter node kernel coordination and communication, and pipes for inter process communication. Simon L Jackson (talk) 02:52, 11 January 2009 (UTC)[reply]

Shared Roots

"Shared Roots" might not belong with SSI - A shared root cluster is intermediate between a SSI and an "incoherent" conventional linux "bunch of boxes" cluster. More precisely, all SSIs must have real or virtual shared roots, but not all shared roots need be SSIs: A shared root cluster has a unified filesystem between nodes, but not necessarily a unified process/memory space. A shared root without unification of other aspects is a convenient compromise between scalability and ease of management.

For linux clusters, the largest clusters (10000s of nodes+) are "bunch of boxes" clusters, shared roots to date are used up to 1000s of nodes, SSIs up to 100s. But SSIs are the easiest to manage/use, then shared roots, then bunches of boxes.

Additional "Shared Roots" thought from Simon L Jackson (talk) 02:26, 11 January 2009 (UTC)[reply]

Historically, the term cluster, as used by DEC from the early 1980s, meant SSI cluster.
Shared roots are often not particularly shared. Should we distinguish shared boot from shared root? Both OpenVMS and TruCluster can boot off the shared root regardless of whether it is a directly available disk (eg Y cabled bus, iSCSI or SAN) or via a network boot (sometimes referred to as a "satellite" node).
To share a root, a single security model should be considered.
To have a fully shared root (or perhaps this should be shared boot) means cluster communications needs to be present very early in the boot process, before the shared root is formally mounted, and therefore before a full IP stack can be running. Thus some SSI clusters either don't or prefer not to IP for communications. If they do, they use a separate simplified IP stack. The SCS protocol used by OpenVMS is specifically designed to provide flexible cluster communication and is significantly more efficient than IPv4.

Followups from HughesJohn (talk) 13:47, 12 January 2009 (UTC) about "shared roots"[reply]

Yeah, when I did my rewrite I was of the opinion (influenced by my OpenSSI background I expect) that a cluster was SSI, and to be SSI it had to have the whole kit and caboodle. I came up with the idea of splitting the discussion into a set of features, which were more or less provided by different systems as a way of avoiding flamewars about which were "real" SSI systems (for example I was originally very dubious about the openMosix claims to be SSI). However as time passes I think I stumbled on the right idea - SSI is not an absolute, different systems include different SSI features.
Yes, shared boot is different from shared root. For example OpenSSI usually doesn't do shared boot - each node boots from its own local disk, then joins the cluster to find the root. (It can do shared boot with Etherboot or PXE, but that has the disadvantage of serialising the boot process).
It seems to me that shared root implies a single security model. Maybe we should discuss this in that section.
These days a full IP stack could be in a network card's boot rom (Etherboot/PXE) so that doesn't seem to be much of a problem. As I said above OpenSSI usually boots the full Linux kernel from a local device, so it can handle fairly complex protocols for in cluster communications - Infiniband for example).

HughesJohn (talk) 13:47, 12 January 2009 (UTC)[reply]

Hot node addition/removal?

This is the ability to add or remove nodes at runtime, rather than at cluster start time. I know that OpenSSI can do it, Kerrighed can't (but is working on it), not sure about others. Somewhat important because it affects the purpose of the cluster: Is it designed to be highly available, such that individual machines can fail but the cluster lives? Or does adding additional systems increase performance but decrease reliability, since node failure means cluster failure? Essentially the same kind of difference as (say) RAID-1 and RAID-0, respectively.

On a similar note, there's also the question of whether processes can live independently of their initial node. In Mosix-based systems, one node may be doing the heavy lifting CPU-wise, but all I/O and IPC has to be proxied back to the starting node; if it fails, the process is dead. By contrast, I believe OpenSSI tries to translate as many resources as it can to equivalent resources on the local machine, so only hardware-reliant processes die if the corresponding starter node dies. This feature is obviously only relevant if a system supports hot removal, since if a node dies in (say) Kerrighed, your cluster is dead and the whole issue is moot.

Not sure if either or both of these warrant addition to the features section, some new section, and/or the grid. — Wisq (talk) 21:19, 16 September 2009 (UTC)[reply]