Search Results

Search found 1706 results on 69 pages for 'distributed'.

Page 17/69 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >

Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Windows Azure Recipe: Big Data

- by Clint Edmonson

As the name implies, what we’re talking about here is the explosion of electronic data that comes from huge volumes of transactions, devices, and sensors being captured by businesses today. This data often comes in unstructured formats and/or too fast for us to effectively process in real time. Collectively, we call these the 4 big data V’s: Volume, Velocity, Variety, and Variability. These qualities make this type of data best managed by NoSQL systems like Hadoop, rather than by conventional Relational Database Management System (RDBMS). We know that there are patterns hidden inside this data that might provide competitive insight into market trends. The key is knowing when and how to leverage these “No SQL” tools combined with traditional business such as SQL-based relational databases and warehouses and other business intelligence tools. Drivers Petabyte scale data collection and storage Business intelligence and insight Solution The sketch below shows one of many big data solutions using Hadoop’s unique highly scalable storage and parallel processing capabilities combined with Microsoft Office’s Business Intelligence Components to access the data in the cluster. Ingredients Hadoop – this big data industry heavyweight provides both large scale data storage infrastructure and a highly parallelized map-reduce processing engine to crunch through the data efficiently. Here are the key pieces of the environment: Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Mahout - a machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm. Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage. Directly accessible to Microsoft Office and other consumers via add-ins and the Hive ODBC data driver. Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components. Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS. Database – directly accessible to Hadoop via the Sqoop based Microsoft SQL Server Connector for Apache Hadoop, data can be efficiently transferred to traditional relational data stores for replication, reporting, or other needs. Reporting – provides easily consumable reporting when combined with a database being fed from the Hadoop environment. Training These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. Hadoop Learning Resources (20+ tutorials and labs) Huge collection of resources for learning about all aspects of Apache Hadoop-based development on Windows Azure and the Hadoop and Windows Azure Ecosystems SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

Read the article
Pay in the future should make you think in the present

- by BuckWoody

Distributed Computing - and more importantly “-as-a-Service” models of computing have a different cost model. This is something that sounds obvious on the surface but it’s often forgotten during the design and coding phase of a project. In on-premises computing, we’re used to purchasing a server and all of the hardware infrastructure and software licenses needed not only for one project, but several. This is an up-front or “sunk” cost that we consume by running code the organization needs to perform its function. Using a direct connection over wires you’ve already paid for, we don’t often have to think about bandwidth, hits on the data store or the amount of compute we use - we just know more is better. In a pay-as-you-go model, however, each of these architecture decisions has a potential cost impact. The amount of data you store, the number of times you access it, and the amount you send back all come with a charge. The offset is that you don’t buy anything at all up-front, so that sunk cost is freed up. And financial professionals know that money now is worth more than money later. Saving that up-front cost allows you to invest it in other things. It’s not just that you’re using things that now cost money - it’s that the design itself in distributed computing has a cost impact. That can be a really good thing, such as when you dynamically add capacity for paying customers. If you can tie back the cost of a series of clicks to what a user will pay to do so, you can set a profit margin that is easy to track. Here’s a case in point: Assume you are using a large instance in Windows Azure to compute some data that you retrieve from a SQL Azure database. If you don’t monitor the path of the application, you may not know what you are really using. Since you’re paying by the size of the instance, it’s best to maximize it all the time. Recently I evaluated just this situation, and found that downsizing the instance and adding another one where needed, adding a caching function to the application, moving part of the data into Windows Azure tables not only increased the speed of the application, but reduced the cost and more closely tied the cost to the profit. The key is this: from the very outset - the design - make sure you include metrics to measure for the cost/performance (sometimes these are the same) for your application. Windows Azure opens up awesome new ways of doing things, so make sure you study distributed systems architecture before you try and force in the application design you have on premises into your new application structure.

Read the article
What are the real life implications for an Apache 2 license?

- by Duopixel

I want to use SVG Edit for project. This software is distributed under the Apache 2 license. I've seen that: all copies, modified or unmodified, are accompanied by a copy of the licence all modifications are clearly marked as being the work of the modifier all notices of copyright, trademark and patent rights are reproduced accurately in distributed copies the licensee does not use any trademarks that belong to the licensor Do these pertain to the code or should I display the license somewhere in the GUI? The orignal software displays a "powered by SVG Edit", is it ok if I remove this? And most importantly: what is the correct etiquette for doing this? I don't want to be an asshole, but at the same time I want to simplify the UI as much as possible and removing the link will be part of it if it's not considered rude.

Read the article
TG'10 Conference Set

Meeting of the nation's largest and most-comprehensive distributed cyberinfrastructure for open scientific research will take place August 2-5 Research - Computer Science - Cyberinfrastructure - National Science Foundation - Organizations

Read the article
TG'10 Conference Set

Meeting of the nation's largest and most-comprehensive distributed cyberinfrastructure for open scientific research will take place August 2-5 Research - Computer Science - Cyberinfrastructure - National Science Foundation - Organizations

Read the article
Self-Scaling Architecture for Group and Enterprise Computing

Better computing and communication for emergency personnel at disaster sites Computing - Business - Computer Science - Distributed Computing - Cloud computing

Read the article
Is Ubuntu MAAS free? Will it remain like that?

- by Bruno Pereira

Ubuntu MAAS, very cool, awesome in fact, looks like a unique tool for several jobs. It looks free, but part of its documentation starts already with clauses that would scare anyone with interest in it: Documentation is copy righted by Canonical; Documentation must be used only for non-commercial purposes; If documentation is distributed within the non-commercial clause you must retain copyright; It just sounds a lot for a guide on how to install MAAS + Juju + Openstack and that scares me a bit. Under what license is Ubuntu MAAS distributed and what would be the reasoning for being so worried about copyrighting a guide like that so heavily? Is Ubuntu MAAS free? Will it continue like that?

Read the article
Basic memcached question

- by Aadith

I have been reading up on distributed hashing. I learnt that consistent hashing is used for distributing the keys among cache machines. I also learnt that, a key is duplicated on mutiple caches to handle failure of cache hosts. But what I have come across on memcached doesn't seem to be in alignment with all this. I read that all cache nodes are independent of each other and that if a cache goes down, requests go to DB. Theres no mention of cache miss on a host resulting in the host directing the request to another host which could either be holding the key or is nearer to the key. Can you please tell me how these two fit together? Is memcached a very preliminary form of distributed hashing which doesnt have much sophistication?

Read the article
On what name should I claim copyright in open source software?

- by ONOZ

When I want to use the Apache 2.0 licence in my project, I should include this in the comments of my source code: Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. What name should I fill in for [name of copyright owner]? I am currently working alone on this project, but I'm going to release the source code so there might be other contributors in the near future.

Read the article
Hadoop:Only master node does the work

- by user287722

I've setup a Hadoop 2.2 cluster with 1 master node(namenode and secondary namenode) and 3 slave nodes(datanode and namenode on each one).All of the machines use Linux Mint 64bit. When I run my MapReduce program, writen in Java, I can only see that master node is using extra CPU and RAM. Slave nodes are not doing a thing. I've checked the logs from all of the namenodes and there is nothing wrong with the namenodes on slave nodes. Resource Manager is running and all of the slave nodes can see the Resource Manager. I used this http://n0where.net/hadoop-2-2-multi-node-cluster-setup/ tutorial to configure my nodes. Datanodes are working in terms of distributed data storing but I can't see any indication of distributed data processing. Do I have to configure the xml configuration files in some other way so all of the machines will process data while I'm running my MapReduce Job?

Read the article
Report: Memcached Vendors Bulk Up for Web 2.0

Gear6 and Schooner roll out new solutions built on top of the open source memcached distributed caching effort.

Read the article
WS-Eventing for WCF (Indigo)

This article describes the design, implementation and usage of the WS-Eventing for distributed applications driven by new MS communication model WCF (Windows Communication Foundation)

Read the article
EMC Gives Data Domain New Replication Features

Cascaded replication and 180-to-1 remote site fan-in features are aimed at large distributed enterprises.

Read the article
Problem with MS DTC on SQL2008 win server 2k8 with linked server from sql2000 win server 2k

- by user31648

Hi, We have migrated our db from sql2000 win server 2k to sql2008 win server 2k8. We have linked server from sql2000 win server 2k. By our opinion the problem is with DTC and we have made a lot of setting that we found as solution for our problem, but still the problem exist. There is no any error or worning or information niether in the sql log nor in win event viewer. The application is hanging out and at the end the time out exception is shown. What we have done till now: Enable Network DTC Access with inbound and outbound with No Authentication Required on win 2k8 We have opened RPC dynamic port allocation through registry on 2k and 2k8 We have entered subkey TurnOffRpcSecurity in the registry HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC and made it enable on 2k and 2k8 We have added exception for DTC in firewall for all entities What we have notice that when we restart SQL service and make the first try for our transaction the following is shown: "Attempting to initialize Microsoft Distributed Transaction Coordinator (MS DTC). This is an informational message only. No user action is required." and after it: "Recovery of any in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC) has completed. This is an informational message only. No user action is required." Does someone have any idea what else can be done in order to solve the problem? Thanks in advance. Regards, Snezana

Read the article
Why Kiln is based on Mercurial, and not other (D)VCS

- by Jakub Narebski

What were the reason for chosing Mercurial as a basis of FogCreek Kiln, a source control management system with tightly integrated code review, and FogBugz integration? Why Mercurial, and not other (distributed) version control system, like Bazaar, Git or Monotone, or creating own version control system like Fossil (distributed software configuration management, including bug tracking and wiki) did? What were features that make FogCreek choose Mercurial as Kiln engine?

Read the article
collection of system properties using web browser

- by vishwa

hi i am doing distributed computing environment........For the applications need to get distributed to different clients connected to the server in the network,i prefered to collect the client's system properties like free memory available in the client's system,so that i could distribute d application according to that efficiently......so kindly project me wth some idea.thanks in advance

Read the article
How to generate a random BigInteger value in Java?

- by Bill the Lizard

I need to generate arbitrarily large random integers in the range 0 (inclusive) to n (exclusive). My initial thought was to call nextDouble and multiply by n, but once n gets to be larger than 253, the results would no longer be uniformly distributed. BigInteger has the following constructor available: public BigInteger(int numBits, Random rnd) Constructs a randomly generated BigInteger, uniformly distributed over the range 0 to (2numBits - 1), inclusive. How can this be used to get a random value in the range 0 - n, where n is not a power of 2?

Read the article
Choice and setup of version control

- by Peter M

I am about to set up an new laptop and in the process transition to a new version control system as part of a general cleanup. Currently I use a centralized version control system (yes it is VSS, and yes I know all the pro's and con's of that system, but as a single user system it works well for me). I have very little requirements for a new system and I am free to choose among any of the current mainstream players, but cost constraints will push me towards oss. Some of my requirements are: Runs on a single machine (ie the laptop in question) under windows I am not sharing things with other developers or workers - this is more for my own historical benefits. I want to version source code, documentation and binary files I have a large hierarchy of projects that are unrelated (see below) I have files within the hierarchy that don't need to be controlled (but could be) Some projects use Visual Studio, so some integration there could be nice. There could be some sharing of files between jobs. I generally only need a small about of branching in code files The directory hierarchy that I have at the moment is somewhat like: Root | |--Customer #1 | | | |--Job #1 | | | | | |--Data files received from Customer for Job (not controlled) | | |--Documentation files (controlled) | | |--Project information files (not controlled - but could be) | | |--Software Project Files (controlled) | | |--Scratch dir for job (not controlled) | | | |--Job #2 | | (same structure as above) | |--Customer #2 | |.. | |--Cusmtomer #n |.. Currently I have about 22 customers with differing numbers of projects underneath them. At the moment I have a single VSS repository based at the root of the directory structure. If I kept with a centralized system (ie SVN) I believe that I should keep the same approach and continue with a single repository based from the root dir. Is this a valid approach? However if I move to a distributed tool then I am unsure of how I should handle the situation. My initial guess is that I should not have a repository based on the root of my entire directory structure - but that is a guess so I really don't know how valid it is. Should I pitch a distributed approach at the Root, Customer, Job or sub-Job directory level? Also what I am not clear on with distributed tools (and perhaps with SVN as well), is if I can branch parts of a repository. For example, I can see branching source code in software projects as being useful, but branching my documentation as not being useful. So if I pitch a repository at the Job level, can I just branch the Software Project Files? Or would all files in that Job be branched? Every time I look at distributed tools I get a nagging feeling that they are not suited to my style of setup. I am uncomfortable with idea of having to manually set up something like 50 to 80 separate repositories (if I pitch at the Job level, or 20+ if at the Customer level) within my directory hierarchy. This feeling also extends to having all those repositories scattered around as well - however I do have a backup strategy that I trust, so this latter feeling is pretty well unfounded. So what advice can you all give me? Thanks in advance!

Read the article
trying to use mod_proxy with httpd and tomcat

- by techsjs2012

I been trying to use mod_proxy with httpd and tomcat... I have on VirtualBox running Scientific Linux which has httpd and tomcat 6 on it.. I made two nodes of tomcat6. I followed this guide like 10 times and still cant get the 2nd node of tomcat working.. http://www.richardnichols.net/2010/08/5-minute-guide-clustering-apache-tomcat/ Here is the lines from my http.conf file <Proxy balancer://testcluster stickysession=JSESSIONID> BalancerMember ajp://127.0.0.1:8009 min=10 max=100 route=node1 loadfactor=1 BalancerMember ajp://127.0.0.1:8109 min=10 max=100 route=node2 loadfactor=1 </Proxy> ProxyPass /examples balancer://testcluster/examples <Location /balancer-manager> SetHandler balancer-manager AuthType Basic AuthName "Balancer Manager" AuthUserFile "/etc/httpd/conf/.htpasswd" Require valid-user </Location> Now here is my server.xml from node1 <?xml version='1.0' encoding='utf-8'?>   <Server port="8005" shutdown="SHUTDOWN">  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />  <Listener className="org.apache.catalina.core.JasperListener" />  <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />  <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />  <GlobalNamingResources>  <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources>  <Service name="Catalina">         <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />   <Engine name="Catalina" defaultHost="localhost" jvmRoute="node1">      <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>  <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false">     </Host> </Engine> </Service> </Server> now here is the server.xml file from node2 <?xml version='1.0' encoding='utf-8'?>   <Server port="8105" shutdown="SHUTDOWN">  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />  <Listener className="org.apache.catalina.core.JasperListener" />  <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />  <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />  <GlobalNamingResources>  <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources>  <Service name="Catalina">         <Connector port="8109" protocol="AJP/1.3" redirectPort="8443" />   <Engine name="Catalina" defaultHost="localhost" jvmRoute="node2">      <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>  <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false">     </Host> </Engine> </Service> </Server> I dont know what it is. but I been trying for days

Read the article
A Guide to Fusion SCM at Oracle OpenWorld 2012

- by Pam Petropoulos

Are you attending next week’s Oracle OpenWorld 2012 conference? Then you won’t want to miss the Fusion SCM activities and customer presenters from leading companies like Boeing and Fideltronik. Below you’ll find a day by day guide of the various Fusion SCM sessions, demos and activities during OpenWorld 2012, September 30 – October 4 in San Francisco, CA. Tuesday, October 2 All of the Fusion SCM sessions during OpenWorld will take place in various rooms at Moscone West, a convenience you are sure to appreciate, as will your feet. The first session at 10:15 – 11:15 am (Moscone West, Room 2006), entitled “Oracle Fusion Supply Chain Management: Overview, Strategy, Customer Experiences, and Roadmap”, provides an overview of Fusion Supply Chain Management applications and will discuss Fusion SCM strategy, future roadmap, and highlights of customer examples. The next session at 11:45 am – 12:45 pm (Moscone West, Room 2022), entitled “Enabling Trusted Enterprise Product Data with Oracle Fusion Product Hub”, may be the session for you if you’re struggling with achieving consistent, high-quality product data that provides significant business value. This session will discuss how Oracle Fusion Product Hub and Oracle Enterprise Data Quality can help you to achieve this vision. A customer presenter from Fideltronik will share their experiences with Oracle Fusion Product Hub. At the end of the day unwind at the Supply Chain Management customer reception from 6:00 – 8:00 pm at the Roe Lounge, located at 651 Howard Street. Registration is required. Click here for details. Wednesday, October 3 Wednesday is a busy day with three Fusion SCM sessions on the agenda. Start your day at 10:15 am at the “Oracle Fusion Supply Chain Management: Customer Adoption and Experiences” session (Moscone West, Room 2003). This must see session will showcase customer speakers from The Boeing Company and Fideltronik, each of whom will share their company’s experiences in selecting and implementing Fusion SCM applications. If you’re wondering how Fusion SCM applications can co-exist with your existing Oracle applications, then you’ll want to sit in on the 3:30 pm session entitled “Oracle Fusion Supply Chain Management: Coexistence with Other Oracle Applications” (Moscone West, Room 2003). Stick around until 5:00 pm for the final Fusion SCM session of the day entitled “Responsive Fulfillment with Oracle Fusion Supply Chain Management” (Moscone West, Room 2001). This session will showcase Oracle Fusion Distributed Order Orchestration and Oracle Fusion Global Order Promising and how they are changing the way companies manage order fulfillment in environments. In addition to discussing the current business challenges, product capabilities, value propositions, industry applicability, and future roadmap this session will also feature a customer presenter from The Boeing Company. Thursday, October 4 If you are a retail customer we highly recommend that you attend the final Fusion SCM session of the week at 12:45 pm, entitled “Multichannel Fulfillment Excellence in the Direct-to-Consumer Market” (Moscone West, Room 2024). Retailers will learn how they can transform their supply chains to meet the ever-increasing demands of buy anywhere/get anywhere cross-channel requirements with Fusion Distributed Order Orchestration and Oracle Fusion Product Hub. Throughout the week, you’ll also want to visit the Fusion SCM demo pods at the Demogrounds in Moscone West so you can see demos of these Fusion applications. Visit pod W-005 for Fusion Distributed Order Orchestration, W-008 for Fusion Inventory and Cost Management, and W-006 for Fusion Product Hub. Click here for the Demogrounds map. A reminder that you can also pre-register for these sessions to secure your spot. Visit the Schedule Builder to pre-enroll for these sessions. Finally, you'll also want to check out the Fusion SCM FocusOn document which includes additional keynote and general sessions that you may want to attend throughout the week. We look forward to seeing you in San Francisco next week.

Read the article
Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
What do you need to know to be a world-class master software developer? [closed]

- by glitch

I wanted to bring up this question to you folks and see what you think, hopefully advise me on the matter: let's say you had 30 years of learning and practicing software development in front of you, how would you dedicate your time so that you'd get the biggest bang for your buck. What would you both learn and work on to be a world-class software developer that would make a large impact on the industry and leave behind a legacy? I think that most great developers end up being both broad generalists and specialists in one-two areas of interest. I'm thinking Bill Joy, John Carmack, Linus Torvalds, K&R and so on. I'm thinking that perhaps one approach would be to break things down by categories and establish a base minimum of "software development" greatness. I'm thinking: Operating Systems: completely internalize the core concepts of OS, perhaps gain a lot of familiarity with an OSS one such as Linux. Anything from memory management to device drivers has to be complete second nature. Programming Languages: this is one of those topics that imho has to be fully grokked even if it might take many years. I don't think there's quite anything like going through the process of developing your own compiler, understanding language design trade-offs and so on. Programming Language Pragmatics is one of my favorite books actually, I think you want to have that internalized back to back, and that's just the start. You could go significantly deeper, but I think it's time well spent, because it's such a crucial building block. As a subset of that, you want to really understand the different programming paradigms out there. Imperative, declarative, logic, functional and so on. Anything from assembly to LISP should be at the very least comfortable to write in. Contexts: I believe one should have experience working in different contexts to truly be able to appreciate the trade-offs that are being made every day. Embedded, web development, mobile development, UX development, distributed, cloud computing and so on. Hardware: I'm somewhat conflicted about this one. I think you want some understanding of computer architecture at a low level, but I feel like the concepts that will truly matter will be slightly higher level, such as CPU caching / memory hierarchy, ILP, and so on. Networking: we live in a completely network-dependent era. Having a good understanding of the OSI model, knowing how the Web works, how HTTP works and so on is pretty much a pre-requisite these days. Distributed systems: once again, everything's distributed these days, it's getting progressively harder to ignore this reality. Slightly related, perhaps add solid understanding of how browsers work to that, since the world seems to be moving so much to interfacing with everything through a browser. Tools: Have a really broad toolset that you're familiar with, one that continuously expands throughout the years. Communication: I think being a great writer, effective communicator and a phenomenal team player is pretty much a prerequisite for a lot of a software developer's greatness. It can't be overstated. Software engineering: understanding the process of building software, team dynamics, the requirements of the business-side, all the pitfalls. You want to deeply understand where what you're writing fits from the market perspective. The better you understand all of this, the more of your work will actually see the daylight. This is really just a starting list, I'm confident that there's a ton of other material that you need to master. As I mentioned, you most likely end up specializing in a bunch of these areas as you go along, but I was trying to come up with a baseline. Any thoughts, suggestions and words of wisdom from the grizzled veterans out there who would like to share their thoughts and experiences with this? I'd really love to know what you think!

Read the article
What do these MS DTC Exceptions mean?

- by David B

I wrote a program to demonstrate the behavior of DTC timeouts with multiple threads. I'm getting several exceptions, seemingly at random. Are all of these simple Timeouts, or are some of them indicative of deeper problems (connection pool interactions, etc)? The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction. Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction. The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements. The operation is not valid for the state of the transaction. ExecuteReader requires an open and available Connection. The connection's current state is closed. Here's the data part of the code: using (DemoDataDataContext dc1 = new DemoDataDataContext(Conn1)) using (DemoDataDataContext dc2 = new DemoDataDataContext(Conn2)) { WriteMany(dc1, 100); //generate 100 records for insert WriteMany(dc2, 100000); //generate 100,000 records for insert Console.WriteLine("{0} : {1}", Name, " generated records for insert."); using (TransactionScope ts = new TransactionScope()) { dc1.SubmitChanges(); dc2.SubmitChanges(); ts.Complete(); } }

Read the article
Scalable / Parallel Large Graph Analysis Library?

- by Joel Hoff

I am looking for good recommendations for scalable and/or parallel large graph analysis libraries in various languages. The problems I am working on involve significant computational analysis of graphs/networks with 1-100 million nodes and 10 million to 1+ billion edges. The largest SMP computer I am using has 256 GB memory, but I also have access to an HPC cluster with 1000 cores, 2 TB aggregate memory, and MPI for communication. I am primarily looking for scalable, high-performance graph libraries that could be used in either single or multi-threaded scenarios, but parallel analysis libraries based on MPI or a similar protocol for communication and/or distributed memory are also of interest for high-end problems. Target programming languages include C++, C, Java, and Python. My research to-date has come up with the following possible solutions for these languages: C++ -- The most viable solutions appear to be the Boost Graph Library and Parallel Boost Graph Library. I have looked briefly at MTGL, but it is currently slanted more toward massively multithreaded hardware architectures like the Cray XMT. C - igraph and SNAP (Small-world Network Analysis and Partitioning); latter uses OpenMP for parallelism on SMP systems. Java - I have found no parallel libraries here yet, but JGraphT and perhaps JUNG are leading contenders in the non-parallel space. Python - igraph and NetworkX look like the most solid options, though neither is parallel. There used to be Python bindings for BGL, but these are now unsupported; last release in 2005 looks stale now. Other topics here on SO that I've looked at have discussed graph libraries in C++, Java, Python, and other languages. However, none of these topics focused significantly on scalability. Does anyone have recommendations they can offer based on experience with any of the above or other library packages when applied to large graph analysis problems? Performance, scalability, and code stability/maturity are my primary concerns. Most of the specialized algorithms will be developed by my team with the exception of any graph-oriented parallel communication or distributed memory frameworks (where the graph state is distributed across a cluster).

Read the article

Search Results

Search found 1706 results on 69 pages for 'distributed'.

Page 17/69 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >

- by Pinal Dave

- by Clint Edmonson

- by BuckWoody

- by Duopixel

- by Bruno Pereira

- by Aadith

- by ONOZ

- by user287722

- by user31648

- by Jakub Narebski

- by vishwa

- by Bill the Lizard

- by Peter M

- by techsjs2012

- by Pam Petropoulos

- by Pinal Dave

- by glitch

- by David B

- by Joel Hoff

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >