Search Results

Search found 31 results on 2 pages for 'resilience'.

Page 2/2 | < Previous Page | 1 2

Application Architecture using WCF and System.AddIn

- by Silverhalide

A little background -- we're designing an application that uses a client/server architecture consisting of: A server which loads server-side modules, potentially developed by other teams. A client which loads corresponding client-side modules (also potentially developed by those other teams; each client module corresponds with a server module). The client side communicates with the server side for general coordination, and as well as module specific tasks. (At this point, I think that means client talks to server, client modules talk to server modules.) Environment is .NET 3.5, and client side is WPF. The deployment scenario introduces the potential to upgrade the server, any server-side module, the client, and any client-side module independently. However, being able to "work" using mismatched versions is required. I'm therefore concerned about versioning issues. My thinking so far: A Windows Service for the server. Using System.AddIn for the server to load and communicate with the server modules will give us the greatest flexibility in terms of version compatability between server and server modules. The server and each server module vend WCF services for communication to the client side; communication between the server and a server module, or between two server modules use the AddIn contracts. (One advantage of this is that a module can expose a different interface within the server and outside it.) Similarly, the client uses System.AddIn to find, load, and communicate with the client modules. Client communications with client modules is via the AddIn interface; communications from the client and from client modules to the server side are via WCF. For maximum resilience, each module will run in a separate app-domain. In general, the system has modest performance requirements, so marshalling and crossing process boundaries is not expected to be a performance concern. (Performance requirement is basically summed up by: don't get in the way of the other parts of the system not described here.) My questions are around the idea of having two different communication and versioning models to work with which will be an added burden on our developers. System.AddIn seems quite powerful, but also a little unwieldly. (I'm also unsure of Microsoft's commitment to it in the future.) On the other hand, I'm not thrilled with WCF's versioning capabilities. I have a feeling that it would be possible to implement the System.AddIn view/adapter/contract system within WCF, but being fairly new to both technologies, I would have no idea of where to start. So... Am I on the right track here? Am I doing this the hard way? Are there gotchas I need to be aware of on this road? Thanks.

Read the article
A Guided Tour of Complexity

- by JoshReuben

I just re-read Complexity – A Guided Tour by Melanie Mitchell , protégé of Douglas Hofstadter ( author of “Gödel, Escher, Bach”) http://www.amazon.com/Complexity-Guided-Tour-Melanie-Mitchell/dp/0199798109/ref=sr_1_1?ie=UTF8&qid=1339744329&sr=8-1 here are some notes and links: Evolved from Cybernetics, General Systems Theory, Synergetics some interesting transdisciplinary fields to investigate: Chaos Theory - http://en.wikipedia.org/wiki/Chaos_theory – small differences in initial conditions (such as those due to rounding errors in numerical computation) yield widely diverging outcomes for chaotic systems, rendering long-term prediction impossible. System Dynamics / Cybernetics - http://en.wikipedia.org/wiki/System_Dynamics – study of how feedback changes system behavior Network Theory - http://en.wikipedia.org/wiki/Network_theory – leverage Graph Theory to analyze symmetric / asymmetric relations between discrete objects Algebraic Topology - http://en.wikipedia.org/wiki/Algebraic_topology – leverage abstract algebra to analyze topological spaces There are limits to deterministic systems & to computation. Chaos Theory definitely applies to training an ANN (artificial neural network) – different weights will emerge depending upon the random selection of the training set. In recursive Non-Linear systems http://en.wikipedia.org/wiki/Nonlinear_system – output is not directly inferable from input. E.g. a Logistic map: Xt+1 = R Xt(1-Xt) Different types of bifurcations, attractor states and oscillations may occur – e.g. a Lorenz Attractor http://en.wikipedia.org/wiki/Lorenz_system Feigenbaum Constants http://en.wikipedia.org/wiki/Feigenbaum_constants express ratios in a bifurcation diagram for a non-linear map – the convergent limit of R (the rate of period-doubling bifurcations) is 4.6692016 Maxwell’s Demon - http://en.wikipedia.org/wiki/Maxwell%27s_demon - the Second Law of Thermodynamics has only a statistical certainty – the universe (and thus information) tends towards entropy. While any computation can theoretically be done without expending energy, with finite memory, the act of erasing memory is permanent and increases entropy. Life & thought is a counter-example to the universe’s tendency towards entropy. Leo Szilard and later Claude Shannon came up with the Information Theory of Entropy - http://en.wikipedia.org/wiki/Entropy_(information_theory) whereby Shannon entropy quantifies the expected value of a message’s information in bits in order to determine channel capacity and leverage Coding Theory (compression analysis). Ludwig Boltzmann came up with Statistical Mechanics - http://en.wikipedia.org/wiki/Statistical_mechanics – whereby our Newtonian perception of continuous reality is a probabilistic and statistical aggregate of many discrete quantum microstates. This is relevant for Quantum Information Theory http://en.wikipedia.org/wiki/Quantum_information and the Physics of Information - http://en.wikipedia.org/wiki/Physical_information. Hilbert’s Problems http://en.wikipedia.org/wiki/Hilbert's_problems pondered whether mathematics is complete, consistent, and decidable (the Decision Problem – http://en.wikipedia.org/wiki/Entscheidungsproblem – is there always an algorithm that can determine whether a statement is true). Godel’s Incompleteness Theorems http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems proved that mathematics cannot be both complete and consistent (e.g. “This statement is not provable”). Turing through the use of Turing Machines (http://en.wikipedia.org/wiki/Turing_machine symbol processors that can prove mathematical statements) and Universal Turing Machines (http://en.wikipedia.org/wiki/Universal_Turing_machine Turing Machines that can emulate other any Turing Machine via accepting programs as well as data as input symbols) that computation is limited by demonstrating the Halting Problem http://en.wikipedia.org/wiki/Halting_problem (is is not possible to know when a program will complete – you cannot build an infinite loop detector). You may be used to thinking of 1 / 2 / 3 dimensional systems, but Fractal http://en.wikipedia.org/wiki/Fractal systems are defined by self-similarity & have non-integer Hausdorff Dimensions !!! http://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension – the fractal dimension quantifies the number of copies of a self similar object at each level of detail – eg Koch Snowflake - http://en.wikipedia.org/wiki/Koch_snowflake Definitions of complexity: size, Shannon entropy, Algorithmic Information Content (http://en.wikipedia.org/wiki/Algorithmic_information_theory - size of shortest program that can generate a description of an object) Logical depth (amount of info processed), thermodynamic depth (resources required). Complexity is statistical and fractal. John Von Neumann’s other machine was the Self-Reproducing Automaton http://en.wikipedia.org/wiki/Self-replicating_machine . Cellular Automata http://en.wikipedia.org/wiki/Cellular_automaton are alternative form of Universal Turing machine to traditional Von Neumann machines where grid cells are locally synchronized with their neighbors according to a rule. Conway’s Game of Life http://en.wikipedia.org/wiki/Conway's_Game_of_Life demonstrates various emergent constructs such as “Glider Guns” and “Spaceships”. Cellular Automatons are not practical because logical ops require a large number of cells – wasteful & inefficient. There are no compilers or general program languages available for Cellular Automatons (as far as I am aware). Random Boolean Networks http://en.wikipedia.org/wiki/Boolean_network are extensions of cellular automata where nodes are connected at random (not to spatial neighbors) and each node has its own rule –> they demonstrate the emergence of complex & self organized behavior. Stephen Wolfram’s (creator of Mathematica, so give him the benefit of the doubt) New Kind of Science http://en.wikipedia.org/wiki/A_New_Kind_of_Science proposes the universe may be a discrete Finite State Automata http://en.wikipedia.org/wiki/Finite-state_machine whereby reality emerges from simple rules. I am 2/3 through this book. It is feasible that the universe is quantum discrete at the plank scale and that it computes itself – Digital Physics: http://en.wikipedia.org/wiki/Digital_physics – a simulated reality? Anyway, all behavior is supposedly derived from simple algorithmic rules & falls into 4 patterns: uniform , nested / cyclical, random (Rule 30 http://en.wikipedia.org/wiki/Rule_30) & mixed (Rule 110 - http://en.wikipedia.org/wiki/Rule_110 localized structures – it is this that is interesting). interaction between colliding propagating signal inputs is then information processing. Wolfram proposes the Principle of Computational Equivalence - http://mathworld.wolfram.com/PrincipleofComputationalEquivalence.html - all processes that are not obviously simple can be viewed as computations of equivalent sophistication. Meaning in information may emerge from analogy & conceptual slippages – see the CopyCat program: http://cognitrn.psych.indiana.edu/rgoldsto/courses/concepts/copycat.pdf Scale Free Networks http://en.wikipedia.org/wiki/Scale-free_network have a distribution governed by a Power Law (http://en.wikipedia.org/wiki/Power_law - much more common than Normal Distribution). They are characterized by hubs (resilience to random deletion of nodes), heterogeneity of degree values, self similarity, & small world structure. They grow via preferential attachment http://en.wikipedia.org/wiki/Preferential_attachment – tipping points triggered by positive feedback loops. 2 theories of cascading system failures in complex systems are Self-Organized Criticality http://en.wikipedia.org/wiki/Self-organized_criticality and Highly Optimized Tolerance http://en.wikipedia.org/wiki/Highly_optimized_tolerance. Computational Mechanics http://en.wikipedia.org/wiki/Computational_mechanics – use of computational methods to study phenomena governed by the principles of mechanics. This book is a great intuition pump, but does not cover the more mathematical subject of Computational Complexity Theory – http://en.wikipedia.org/wiki/Computational_complexity_theory I am currently reading this book on this subject: http://www.amazon.com/Computational-Complexity-Christos-H-Papadimitriou/dp/0201530821/ref=pd_sim_b_1 stay tuned for that review!

Read the article
How to Achieve OC4J RMI Load Balancing

- by fip

This is an old, Oracle SOA and OC4J 10G topic. In fact this is not even a SOA topic per se. Questions of RMI load balancing arise when you developed custom web applications accessing human tasks running off a remote SOA 10G cluster. Having returned from a customer who faced challenges with OC4J RMI load balancing, I felt there is still some confusions in the field how OC4J RMI load balancing work. Hence I decide to dust off an old tech note that I wrote a few years back and share it with the general public. Here is the tech note: Overview A typical use case in Oracle SOA is that you are building web based, custom human tasks UI that will interact with the task services housed in a remote BPEL 10G cluster. Or, in a more generic way, you are just building a web based application in Java that needs to interact with the EJBs in a remote OC4J cluster. In either case, you are talking to an OC4J cluster as RMI client. Then immediately you must ask yourself the following questions: 1. How do I make sure that the web application, as an RMI client, even distribute its load against all the nodes in the remote OC4J cluster? 2. How do I make sure that the web application, as an RMI client, is resilient to the node failures in the remote OC4J cluster, so that in the unlikely case when one of the remote OC4J nodes fail, my web application will continue to function? That is the topic of how to achieve load balancing with OC4J RMI client. Solutions You need to configure and code RMI load balancing in two places: 1. Provider URL can be specified with a comma separated list of URLs, so that the initial lookup will land to one of the available URLs. 2. Choose a proper value for the oracle.j2ee.rmi.loadBalance property, which, along side with the PROVIDER_URL property, is one of the JNDI properties passed to the JNDI lookup.(http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI) More details below: About the PROVIDER_URL The JNDI property java.name.provider.url's job is, when the client looks up for a new context at the very first time in the client session, to provide a list of RMI context The value of the JNDI property java.name.provider.url goes by the format of a single URL, or a comma separate list of URLs. A single URL. For example: opmn:ormi://host1:6003:oc4j_instance1/appName1 A comma separated list of multiple URLs. For examples: opmn:ormi://host1:6003:oc4j_instanc1/appName, opmn:ormi://host2:6003:oc4j_instance1/appName, opmn:ormi://host3:6003:oc4j_instance1/appName When the client looks up for a new Context the very first time in the client session, it sends a query against the OPMN referenced by the provider URL. The OPMN host and port specifies the destination of such query, and the OC4J instance name and appName are actually the “where clause” of the query. When the PROVIDER URL reference a single OPMN server Let's consider the case when the provider url only reference a single OPMN server of the destination cluster. In this case, that single OPMN server receives the query and returns a list of the qualified Contexts from all OC4Js within the cluster, even though there is a single OPMN server in the provider URL. A context represent a particular starting point at a particular server for subsequent object lookup. For example, if the URL is opmn:ormi://host1:6003:oc4j_instance1/appName, then, OPMN will return the following contexts: appName on oc4j_instance1 on host1 appName on oc4j_instance1 on host2, appName on oc4j_instance1 on host3, (provided that host1, host2, host3 are all in the same cluster) Please note that One OPMN will be sufficient to find the list of all contexts from the entire cluster that satisfy the JNDI lookup query. You can do an experiment by shutting down appName on host1, and observe that OPMN on host1 will still be able to return you appname on host2 and appName on host3. When the PROVIDER URL reference a comma separated list of multiple OPMN servers When the JNDI propery java.naming.provider.url references a comma separated list of multiple URLs, the lookup will return the exact same things as with the single OPMN server: a list of qualified Contexts from the cluster. The purpose of having multiple OPMN servers is to provide high availability in the initial context creation, such that if OPMN at host1 is unavailable, client will try the lookup via OPMN on host2, and so on. After the initial lookup returns and cache a list of contexts, the JNDI URL(s) are no longer used in the same client session. That explains why removing the 3rd URL from the list of JNDI URLs will not stop the client from getting the EJB on the 3rd server. About the oracle.j2ee.rmi.loadBalance Property After the client acquires the list of contexts, it will cache it at the client side as “list of available RMI contexts”. This list includes all the servers in the destination cluster. This list will stay in the cache until the client session (JVM) ends. The RMI load balancing against the destination cluster is happening at the client side, as the client is switching between the members of the list. Whether and how often the client will fresh the Context from the list of Context is based on the value of the oracle.j2ee.rmi.loadBalance. The documentation at http://docs.oracle.com/cd/B31017_01/web.1013/b28958/rmi.htm#BABDGFBI list all the available values for the oracle.j2ee.rmi.loadBalance. Value Description client If specified, the client interacts with the OC4J process that was initially chosen at the first lookup for the entire conversation. context Used for a Web client (servlet or JSP) that will access EJBs in a clustered OC4J environment. If specified, a new Context object for a randomly-selected OC4J instance will be returned each time InitialContext() is invoked. lookup Used for a standalone client that will access EJBs in a clustered OC4J environment. If specified, a new Context object for a randomly-selected OC4J instance will be created each time the client calls Context.lookup(). Please note the regardless of the setting of oracle.j2ee.rmi.loadBalance property, the “refresh” only occurs at the client. The client can only choose from the "list of available context" that was returned and cached from the very first lookup. That is, the client will merely get a new Context object from the “list of available RMI contexts” from the cache at the client side. The client will NOT go to the OPMN server again to get the list. That also implies that if you are adding a node to the server cluster AFTER the client’s initial lookup, the client would not know it because neither the server nor the client will initiate a refresh of the “list of available servers” to reflect the new node. About High Availability (i.e. Resilience Against Node Failure of Remote OC4J Cluster) What we have discussed above is about load balancing. Let's also discuss high availability. This is how the High Availability works in RMI: when the client use the context but get an exception such as socket is closed, it knows that the server referenced by that Context is problematic and will try to get another unused Context from the “list of available contexts”. Again, this list is the list that was returned and cached at the very first lookup in the entire client session.

Read the article
Evaluating Solutions to Manage Product Compliance? Don't Wait Much Longer

- by Kerrie Foy

Depending on severity, product compliance issues can cause all sorts of problems from run-away budgets to business closures. But effective policies and safeguards can create a strong foundation for innovation, productivity, market penetration and competitive advantage. If you’ve been putting off a systematic approach to product compliance, it is time to reconsider that decision, or indecision. Why now? No matter what industry, companies face a litany of worldwide and regional regulations that require proof of product compliance and environmental friendliness for market access. For example, Restriction of Hazardous Substances (RoHS) is a regulation that restricts the use of six dangerous materials used in the manufacture of electronic and electrical equipment. ROHS was originally adopted by the European Union in 2003 for implementation in 2006, and it has evolved over time through various regional versions for North America, China, Japan, Korea, Norway and Turkey. In addition, the RoHS directive allowed for material exemptions used in Medical Devices, but that exemption ends in 2014. Additional regulations worth watching are the Battery Directive, Waste Electrical and Electronic Equipment (WEEE), and Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) directives. Additional evolving regulations are coming from governing bodies like the Food and Drug Administration (FDA) and the International Organization for Standardization (ISO). Corporate sustainability initiatives are also gaining urgency and influencing product design. In a survey of 405 corporations in the Global 500 by Carbon Disclosure Project, co-written by PwC (CDP Global 500 Climate Change Report 2012 entitled Business Resilience in an Uncertain, Resource-Constrained World), 48% of the respondents indicated they saw potential to create new products and business services as a response to climate change. Just 21% reported a dedicated budget for the research. However, the report goes on to explain that those few companies are winning over new customers and driving additional profits by exploiting their abilities to adapt to environmental needs. The article cites Dell as an example – Dell has invested in research to develop new products designed to reduce its customers’ emissions by more than 10 million metric tons of CO2e per year. This reduction in emissions should save Dell’s customers over $1billion per year as a result! Over time we expect to see many additional companies prove that eco-design provides marketplace benefits through differentiation and direct customer value. How do you meet compliance requirements and also successfully invest in eco-friendly designs? No doubt companies struggle to answer this question. After all, the journey to get there may involve transforming business models, go-to-market strategies, supply networks, quality assurance policies and compliance processes per the rapidly evolving global and regional directives. There may be limited executive focus on the initiative, inability to quantify noncompliance, or not enough resources to justify investment. To make things even more difficult to address, compliance responsibility can be a passionate topic within an organization, making the prospect of change on an enterprise scale problematic and time-consuming. Without a single source of truth for product data and without proper processes in place, ensuring product compliance burgeons into a crushing task that is cost-prohibitive and overwhelming to an organization. With all the overhead, certain markets or demographics become simply inaccessible. Therefore, the risk to consumer goodwill and satisfaction, revenue, business continuity, and market potential is too great not to solve the compliance challenge. Companies are beginning to adapt and even thrive in today’s highly regulated and transparent environment by implementing systematic approaches to product compliance that are more than functional bandages but revenue-generating engines. Consider partnering with Oracle to help you address your compliance needs. Many of the world’s most innovative leaders and pioneers are leveraging Oracle’s Agile Product Lifecycle Management (PLM) portfolio of enterprise applications to manage the product value chain, centralize product data, automate processes, and launch more eco-friendly products to market faster. Particularly, the Agile Product Governance & Compliance (PG&C) solution provides out-of-the-box functionality to integrate actionable regulatory information into the enterprise product record from the ideation to the disposal/recycling phase. Agile PG&C makes it possible to efficiently manage compliance per corporate green initiatives as well as regional and global directives. Options are critical, but so is ease-of-use. Anyone who’s grappled with compliance policy knows legal interpretation plays a major role in determining how an organization responds to regulation. Agile PG&C gives you the freedom to configure product compliance per your needs, while maintaining rigorous control over the product record in an easy-to-use interface that facilitates adoption efforts. It allows you to assign regulations as specifications for a part or BOM roll-up. Each specification has a threshold value that alerts you to a non-compliance issue if the threshold value is exceeded. Set however many regulations as specifications you need to make sure a product can be sold in your target countries. Another option is to implement like one of our leading consumer electronics customers and define your own “catch-all” specification to ensure compliance in all markets. You can give your suppliers secure access to enter their component data or integrate a third party’s data. With Agile PG&C you are able to design compliance earlier into your products to reduce cost and improve quality downstream when stakes are higher. Agile PG&C is a comprehensive solution that makes product compliance more reliable and efficient. Throughout product lifecycles, use the solution to support full material disclosures, efficiently manage declarations with your suppliers, feed compliance data into a corrective action if a product must be changed, and swiftly satisfy audits by showing all due diligence tracked in one solution. Given the compounding regulation and consumer focus on urgent environmental issues, now is the time to act. Implementing an enterprise, systematic approach to product compliance is a competitive investment. From the start, Agile Product Governance & Compliance enables companies to confidently design for compliance and sustainability, reduce the cost of compliance, minimize the risk of business interruption, deliver responsible products, and inspire new innovation. Don’t wait any longer! To find out more about Agile Product Governance & Compliance download the data sheet, contact your sales representative, or call Oracle at 1-800-633-0738. Many thanks to Shane Goodwin, Senior Manager, Oracle Agile PLM Product Management, for contributions to this article.

Read the article
Building a SOA/BPM/BAM Cluster Part I – Preparing the Environment

- by antony.reynolds

An increasing number of customers are using SOA Suite in a cluster configuration, I might hazard to say that the majority of production deployments are now using SOA clusters. So I thought it may be useful to detail the steps in building an 11g cluster and explain a little about why things are done the way they are. In this series of posts I will explain how to build a SOA/BPM cluster using the Enterprise Deployment Guide. This post will explain the setting required to prepare the cluster for installation and configuration. Software Required The following software is required for an 11.1.1.3 SOA/BPM install. Software Version Notes Oracle Database Certified databases are listed here SOA & BPM Suites require a working database installation. Repository Creation Utility (RCU) 11.1.1.3 If upgrading an 11.1.1.2 repository then a separate script is available. Web Tier Utilities 11.1.1.3 Provides Web Server, 11.1.1.3 is an upgrade to 11.1.1.2, so 11.1.1.2 must be installed first. Web Tier Utilities 11.1.1.3 Web Server, 11.1.1.3 Patch. You can use the 11.1.1.2 version without problems. Oracle WebLogic Server 11gR1 10.3.3 This is the host platform for 11.1.1.3 SOA/BPM Suites. SOA Suite 11.1.1.2 SOA Suite 11.1.1.3 is an upgrade to 11.1.1.2, so 11.1.1.2 must be installed first. SOA Suite 11.1.1.3 SOA Suite 11.1.1.3 patch, requires 11.1.12 to have been installed. My installation was performed on Oracle Enterprise Linux 5.4 64-bit. Database I will not cover setting up the database in this series other than to identify the database requirements. If setting up a SOA cluster then ideally we would also be using a RAC database. I assume that this is running on separate machines to the SOA cluster. Section 2.1, “Database”, of the EDG covers the database configuration in detail. Settings The database should have processes set to at least 400 if running SOA/BPM and BAM. alter system set processes=400 scope=spfile Run RCU The Repository Creation Utility creates the necessary database tables for the SOA Suite. The RCU can be run from any machine that can access the target database. In 11g the RCU creates a number of pre-defined users and schema with a user defiend prefix. This allows you to have multiple 11g installations in the same database. After running the RCU you need to grant some additional privileges to the soainfra user. The soainfra user should have privileges on the transaction tables. grant select on sys.dba_pending_transactions to prefix_soainfra Grant force any transaction to prefix_soainfra Machines The cluster will be built on the following machines. EDG Name is the name used for this machine in the EDG. Notes are a description of the purpose of the machine. EDG Name Notes LB External load balancer to distribute load across and failover between web servers. WEBHOST1 Hosts a web server. WEBHOST2 Hosts a web server. SOAHOST1 Hosts SOA components. SOAHOST2 Hosts SOA components. BAMHOST1 Hosts BAM components. BAMHOST2 Hosts BAM components. Note that it is possible to collapse the BAM servers so that they run on the same machines as the SOA servers. In this case BAMHOST1 and SOAHOST1 would be the same, as would BAMHOST2 and SOAHOST2. The cluster may include more than 2 servers and in this case we add SOAHOST3, SOAHOST4 etc as needed. My cluster has WEBHOST1, SOAHOST1 and BAMHOST1 all running on a single machine. Software Components The cluster will use the following software components. EDG Name is the name used for this machine in the EDG. Type is the type of component, generally a WebLogic component. Notes are a description of the purpose of the component. EDG Name Type Notes AdminServer Admin Server Domain Admin Server WLS_WSM1 Managed Server Web Services Manager Policy Manager Server WLS_WSM2 Managed Server Web Services Manager Policy Manager Server WLS_SOA1 Managed Server SOA/BPM Managed Server WLS_SOA2 Managed Server SOA/BPM Managed Server WLS_BAM1 Managed Server BAM Managed Server running Active Data Cache WLS_BAM2 Managed Server BAM Manager Server without Active Data Cache Node Manager Will run on all hosts with WLS servers OHS1 Web Server Oracle HTTP Server OHS2 Web Server Oracle HTTP Server LB Load Balancer Load Balancer, not part of SOA Suite The above assumes a 2 node cluster. Network Configuration The SOA cluster requires an extensive amount of network configuration. I would recommend assigning a private sub-net (internal IP addresses such as 10.x.x.x, 192.168.x.x or 172.168.x.x) to the cluster for use by addresses that only need to be accessible to the Load Balancer or other cluster members. Section 2.2, "Network", of the EDG covers the network configuration in detail. EDG Name is the hostname used in the EDG. IP Name is the IP address name used in the EDG. Type is the type of IP address: Fixed is fixed to a single machine. Floating is assigned to one of several machines to allow for server migration. Virtual is assigned to a load balancer and used to distribute load across several machines. Host is the host where this IP address is active. Note for floating IP addresses a range of hosts is given. Bound By identifies which software component will use this IP address. Scope shows where this IP address needs to be resolved. Cluster scope addresses only have to be resolvable by machines in the cluster, i.e. the machines listed in the previous section. These addresses are only used for inter-cluster communication or for access by the load balancer. Internal scope addresses Notes are comments on why that type of IP is used. EDG Name IP Name Type Host Bound By Scope Notes ADMINVHN VIP1 Floating SOAHOST1-SOAHOSTn AdminServer Cluster Admin server, must be able to migrate between SOA server machines. SOAHOST1 IP1 Fixed SOAHOST1 NodeManager, WLS_WSM1 Cluster WSM Server 1 does not require server migration. SOAHOST2 IP2 Fixed SOAHOST1 NodeManager, WLS_WSM2 Cluster WSM Server 2 does not require server migration SOAHOST1VHN VIP2 Floating SOAHOST1-SOAHOSTn WLS_SOA1 Cluster SOA server 1, must be able to migrate between SOA server machines SOAHOST2VHN VIP3 Floating SOAHOST1-SOAHOSTn WLS_SOA2 Cluster SOA server 2, must be able to migrate between SOA server machines BAMHOST1 IP4 Fixed BAMHOST1 NodeManager Cluster BAMHOST1VHN VIP4 Floating BAMHOST1-BAMHOSTn WLS_BAM1 Cluster BAM server 1, must be able to migrate between BAM server machines BAMHOST2 IP3 Fixed BAMHOST2 NodeManager, WLS_BAM2 Cluster BAM server 2 does not require server migration WEBHOST1 IP5 Fixed WEBHOST1 OHS1 Cluster WEBHOST2 IP6 Fixed WEBHOST2 OHS2 Cluster soa.mycompany.com VIP5 Virtual LB LB Public External access point to SOA cluster. admin.mycompany.com VIP6 Virtual LB LB Internal Internal access to WLS console and EM soainternal.mycompany.com VIP7 Virtual LB LB Internal Internal access point to SOA cluster Floating IP addresses are IP addresses that may be re-assigned between machines in the cluster. For example in the event of failure of SOAHOST1 then WLS_SOA1 will need to be migrated to another server. In this case VIP2 (SOAHOST1VHN) will need to be activated on the new target machine. Once set up the node manager will manage registration and removal of the floating IP addresses with the exception of the AdminServer floating IP address. Note that if the BAMHOSTs and SOAHOSTs are the same machine then you can obviously share the hostname and fixed IP addresses, but you still need separate floating IP addresses for the different managed servers. The hostnames don’t have to be the ones given in the EDG, but they must be distinct in the same way as the ETC names are distinct. If the type is a fixed IP then if the addresses are the same you can use the same hostname, for example if you collapse the soahost1, bamhost1 and webhost1 onto a single machine then you could refer to them all as HOST1 and give them the same IP address, however SOAHOST1VHN can never be the same as BAMHOST1VHN because these are floating IP addresses. Notes on DNS IP addresses that are of scope “Cluster” just need to be in the hosts file (/etc/hosts on Linux, C:\Windows\System32\drivers\etc\hosts on Windows) of all the machines in the cluster and the load balancer. IP addresses that are of scope “Internal” need to be available on the internal DNS servers, whilst IP addresses of scope “Public” need to be available on external and internal DNS servers. Shared File System At a minimum the cluster needs shared storage for the domain configuration, XA transaction logs and JMS file stores. It is also possible to place the software itself on a shared server. I strongly recommend that all machines have the same file structure for their SOA installation otherwise you will experience pain! Section 2.3, "Shared Storage and Recommended Directory Structure", of the EDG covers the shared storage recommendations in detail. The following shorthand is used for locations: ORACLE_BASE is the root of the file system used for software and configuration files. MW_HOME is the location used by the installed SOA/BPM Suite installation. This is also used by the web server installation. In my installation it is set to <ORACLE_BASE>/SOA11gPS2. ORACLE_HOME is the location of the Oracle SOA components or the Oracle Web components. This directory is installed under the the MW_HOME but the name is decided by the user at installation, default values are Oracle_SOA1 and Oracle_Web1. In my installation they are set to <MW_HOME>/Oracle_SOA and <MW_HOME>/Oracle _WEB. ORACLE_COMMON_HOME is the location of the common components and is located under the MW_HOME directory. This is always <MW_HOME>/oracle_common. ORACLE_INSTANCE is used by the Oracle HTTP Server and/or Oracle Web Cache. It is recommended to create it under <ORACLE_BASE>/admin. In my installation they are set to <ORACLE_BASE>/admin/Web1, <ORACLE_BASE>/admin/Web2 and <ORACLE_BASE>/admin/WC1. WL_HOME is the WebLogic server home and is always found at <MW_HOME>/wlserver_10.3. Key file locations are shown below. Directory Notes <ORACLE_BASE>/admin/domain_name/aserver/domain_name Shared location for domain. Used to allow admin server to manually fail over between machines. When creating domain_name provide the aserver directory as the location for the domain. In my install this is <ORACLE_BASE>/admin/aserver/soa_domain as I only have one domain on the box. <ORACLE_BASE>/admin/domain_name/aserver/applications Shared location for deployed applications. Needs to be provided when creating the domain. In my install this is <ORACLE_BASE>/admin/aserver/applications as I only have one domain on the box. <ORACLE_BASE>/admin/domain_name/mserver/domain_name Either unique location for each machine or can be shared between machines to simplify task of packing and unpacking domain. This acts as the managed server configuration location. Keeping it separate from Admin server helps to avoid problems with the managed servers messing up the Admin Server. In my install this is <ORACLE_BASE>/admin/mserver/soa_domain as I only have one domain on the box. <ORACLE_BASE>/admin/domain_name/mserver/applications Either unique location for each machine or can be shared between machines. Holds deployed applications. In my install this is <ORACLE_BASE>/admin/mserver/applications as I only have one domain on the box. <ORACLE_BASE>/admin/domain_name/soa_cluster_name Shared directory to hold the following dd – deployment descriptors jms – shared JMS file stores fadapter – shared file adapter co-ordination files tlogs – shared transaction log files In my install this is <ORACLE_BASE>/admin/soa_cluster. <ORACLE_BASE>/admin/instance_name Local folder for web server (OHS) instance. In my install this is <ORACLE_BASE>/admin/web1 and <ORACLE_BASE>/admin/web2. I also have <ORACLE_BASE>/admin/wc1 for the Web Cache I use as a load balancer. <ORACLE_BASE>/product/fmw This can be a shared or local folder for the SOA/BPM Suite software. I used a shared location so I only ran the installer once. In my install this is <ORACLE_BASE>/SOA11gPS2 All the shared files need to be put onto a shared storage media. I am using NFS, but recommendation for production would be a SAN, with mirrored disks for resilience. Collapsing Environments To reduce the hardware requirements it is possible to collapse the BAMHOST, SOAHOST and WEBHOST machines onto a single physical machine. This will require more memory but memory is a lot cheaper than additional machines. For environments that require higher security then stay with a separate WEBHOST tier as per the EDG. Similarly for high volume environments then keep a separate set of machines for BAM and/or Web tier as per the EDG. Notes on Dev Environments In a dev environment it is acceptable to use a a single node (non-RAC) database, but be aware that the config of the data sources is different (no need to use multi-data source in WLS). Typically in a dev environment we will collapse the BAMHOST, SOAHOST and WEBHOST onto a single machine and use a software load balancer. To test a cluster properly we will need at least 2 machines. For my test environment I used Oracle Web Cache as a load balancer. I ran it on one of the SOA Suite machines and it load balanced across the Web Servers on both machines. This was easy for me to set up and I could administer it from a web based console.

Read the article
NFS issue brings down entire vSphere ESX estate

- by growse

I experienced an odd issue this morning where an NFS issue appeared to have taken down the majority of my VMs hosted on a small vSphere 5.0 estate. The infrastructure itself is 4x IBM HS21 blades running around 20 VMs. The storage is provided by a single HP X1600 array with attached D2700 chassis running Solaris 11. There's a couple of storage pools on this which are exposed over NFS for the storage of the VM files, and some iSCSI LUNs for things like MSCS shared disks. Normally, this is pretty stable, but I appreciate the lack of resiliancy in having a single X1600 doing all the storage. This morning, in the logs of each ESX host, at around 0521 GMT I saw a lot of entries like this: 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4cf9a8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4dc9e8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4d3fa8 3 2011-11-30T05:21:54.161Z cpu2:2050)NFSLock: 608: Stop accessing fd 0x41000a4de0a8 3 [....] 2011-11-30T06:16:07.042Z cpu0:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:17:01.459Z cpu2:4011)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:25:17.887Z cpu3:2051)NFSLock: 608: Stop accessing fd 0x41000a4c2b28 3 2011-11-30T06:27:16.063Z cpu3:4011)NFSLock: 568: Start accessing fd 0x41000a4d8928 again 2011-11-30T06:35:30.827Z cpu1:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /tank/ISO, mounted as 5acdbb3e-410e56e3-0000-000000000000 ("ISO (1)") 2011-11-30T06:36:37.953Z cpu6:2054)NFS: 292: Restored connection to the server 10.13.111.197 mount point /tank/ISO, mounted as 5acdbb3e-410e56e3-0000-000000000000 ("ISO (1)") 2011-11-30T06:40:08.242Z cpu6:2054)NFSLock: 608: Stop accessing fd 0x41000a4c3e68 3 2011-11-30T06:40:34.647Z cpu3:2051)NFSLock: 568: Start accessing fd 0x41000a4d8928 again 2011-11-30T06:44:42.663Z cpu1:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:44:53.973Z cpu0:4011)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:51:28.296Z cpu5:2058)NFSLock: 608: Stop accessing fd 0x41000ae3c528 3 2011-11-30T06:51:44.024Z cpu4:2052)NFSLock: 568: Start accessing fd 0x41000ae3b8e8 again 2011-11-30T06:56:30.758Z cpu4:2058)WARNING: NFS: 283: Lost connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T06:56:53.389Z cpu7:2055)NFS: 292: Restored connection to the server 10.13.111.197 mount point /sastank/VMStorage, mounted as f0342e1c-19be66b5-0000-000000000000 ("SAStank") 2011-11-30T07:01:50.350Z cpu6:2054)ScsiDeviceIO: 2316: Cmd(0x41240072bc80) 0x12, CmdSN 0x9803 to dev "naa.600508e000000000505c16815a36c50d" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. 2011-11-30T07:03:48.449Z cpu3:2051)NFSLock: 608: Stop accessing fd 0x41000ae46b68 3 2011-11-30T07:03:57.318Z cpu4:4009)NFSLock: 568: Start accessing fd 0x41000ae48228 again (I've put a complete dump from one of the hosts on pastebin: http://pastebin.com/Vn60wgTt) When I got in the office at 9am, I saw various failures and alarms and troubleshooted the issue. It turned out that pretty much all of the VMs were inaccessible, and that the ESX hosts either were describing each VM as 'powered off', 'powered on', or 'unavailable'. The VMs described as 'powered on' where not in any way reachable or responding to pings, so this may be lies. There's absolutely no indication on the X1600 that anything was awry, and nothing on the switches to indicate any loss of connectivity. I only managed to resolve the issue by rebooting the ESX hosts in turn. I have a number of questions: What the hell happened? If this was a temporary NFS failure, why did it put the ESX hosts into a state from which a reboot was the only recovery? In the future, when the NFS server goes a little off-piste, what would be the best approach to add some resilience? I've been looking at budgeting for next year and potentially have budget to purchase another X1600/D2700/disks, would an identical mirrored disk setup help to mitigate these sorts of failures automatically? Edit (Added requested details) To expand with some details as requested: The X1600 has 12x 1TB disks lumped together in mirrored pairs as tank, and the D2700 (connected with a mini SAS cable) has 12x 300GB 10k SAS disks lumped together in mirrored pairs as sastank zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: sastank state: ONLINE scan: scrub repaired 0 in 74h21m with 0 errors on Wed Nov 30 02:51:58 2011 config: NAME STATE READ WRITE CKSUM sastank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t14d0 ONLINE 0 0 0 c7t15d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t16d0 ONLINE 0 0 0 c7t17d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c7t18d0 ONLINE 0 0 0 c7t19d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c7t20d0 ONLINE 0 0 0 c7t21d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c7t22d0 ONLINE 0 0 0 c7t23d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c7t24d0 ONLINE 0 0 0 c7t25d0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scan: scrub repaired 0 in 17h28m with 0 errors on Mon Nov 28 17:58:19 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c7t8d0 ONLINE 0 0 0 c7t9d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c7t10d0 ONLINE 0 0 0 c7t11d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c7t12d0 ONLINE 0 0 0 c7t13d0 ONLINE 0 0 0 errors: No known data errors The filesystem exposed over NFS for the primary datastore is sastank/VMStorage zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 45.1G 13.4G 92.5K /rpool rpool/ROOT 2.28G 13.4G 31K legacy rpool/ROOT/solaris 2.28G 13.4G 2.19G / rpool/dump 15.0G 13.4G 15.0G - rpool/export 11.9G 13.4G 32K /export rpool/export/home 11.9G 13.4G 32K /export/home rpool/export/home/andrew 11.9G 13.4G 11.9G /export/home/andrew rpool/swap 15.9G 29.2G 123M - sastank 1.08T 536G 33K /sastank sastank/VMStorage 1.01T 536G 1.01T /sastank/VMStorage sastank/comstar 71.7G 536G 31K /sastank/comstar sastank/comstar/sql_tempdb 6.31G 536G 6.31G - sastank/comstar/sql_tx_data 65.4G 536G 65.4G - tank 4.79T 578G 42K /tank tank/FTP 269G 578G 269G /tank/FTP tank/ISO 28.8G 578G 25.9G /tank/ISO tank/backupstage 2.64T 578G 2.49T /tank/backupstage tank/cifs 301G 578G 297G /tank/cifs tank/comstar 1.54T 578G 31K /tank/comstar tank/comstar/msdtc 1.07G 579G 32.8M - tank/comstar/quorum 577M 578G 47.9M - tank/comstar/sqldata 1.54T 886G 304G - tank/comstar/vsphere_lun 2.09G 580G 22.2M - tank/mcs-asset-repository 7.01M 578G 6.99M /tank/mcs-asset-repository tank/mscs-quorum 55K 578G 36K /tank/mscs-quorum tank/sccm 16.1G 578G 12.8G /tank/sccm As for the networking, all connections between the X1600, the Blades and the switch are either LACP or Etherchannel bonded 2x 1Gbit links. Switch is a single Cisco 3750. Storage traffic sits on its own VLAN segregated from VM machine traffic.

Read the article

< Previous Page | 1 2