Search Results

Search found 401 results on 17 pages for 'hadoop'.

Page 11/17 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

Solving Big Problems with Oracle R Enterprise, Part II

- by dbayard

Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet. We demonstrated the calculations against sample data for a small set of accounts. While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger. So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data. This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function. We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet. So our database was empty. But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the). The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create(). The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA. So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function. This worked. No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts. In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series. Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply(). You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running. What actually happens is that ore.groupApply uses the Oracle database to perform the work. And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT. Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function. Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time. Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care. Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program. Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance. This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation. But, this is not an issue for ORE. Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations. That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”. To use rqGroupEval via SQL, there is a bit of simple setup needed. Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table. The next step is to define our R function to the database. We do that via a call to ORE’s rqScriptCreate. Now we can test it. The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax. The first argument to irr_dataGroupEval is a cursor defining our input. You can add additional where clauses and subqueries to this cursor as appropriate. The second argument is any additional inputs to the R function. The third argument is the text of a dummy select statement. The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return. The fourth argument is the column of the input table to split/group by. The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example. For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so. In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that. And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions. For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine. As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations. To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features. Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function. That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above. You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time. While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues). It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world. In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

Read the article
Stream tar.gz file from FTP server

- by linker

Here is the situation: I have a tar.gz file on a FTP server which can contain an arbitrary number of files. Now what I'm trying to accomplish is have this file streamed and uploaded to HDFS through a Hadoop job. The fact that it's Hadoop is not important, in the end what I need to do is write some shell script that would take this file form ftp with wget and write the output to a stream. The reason why I really need to use streams is that there will be a large number of these files, and each file will be huge. It's fairly easy to do if I have a gzipped file and I'm doing something like this: wget -O - "ftp://${user}:${pass}@${host}/$file" | zcat But I'm not even sure if this is possible for a tar.gz file, especially since there are mutliple files in the archive. I'm a bit confused on what direction to take for this, any help would be greatly appreciated.

Read the article
The Oldest Big Data Problem: Parsing Human Language

- by dan.mcclary

There's a new whitepaper up on Oracle Technology Network which details the use of Digital Reasoning Systems' Synthesys software on Oracle Big Data Appliance. Digital Reasoning's approach is inherently "big data friendly," as it leverages multiple components of the Hadoop ecosystem. Moreover, the paper addresses the oldest big data problem of them all: extracting knowledge from human text. You can find the paper here. From the Executive Summary: There is a wealth of information to be extracted from natural language, but that extraction is challenging. The volume of human language we generate constitutes a natural Big Data problem, while its complexity and nuance requires a particular expertise to model and mine. In this paper we illustrate the impressive combination of Oracle Big Data Appliance and Digital Reasoning Synthesys software. The combination of Synthesys and Big Data Appliance makes it possible to analyze tens of millions of documents in a matter of hours. Moreover, this powerful combination achieves four times greater throughput than conducting the equivalent analysis on a much larger cloud-deployed Hadoop cluster.

Read the article
Should I be looking for developers with specific skill sets or generalists that need to learn?

- by Lostsoul

Thanks to the great help of this site and SO, I've been able to make a prototype of a software I want to sell but unfortunately although the prototype works I think my code quality is very low. I didn't use much OOP or design patterns so although my code is understandable to me, I think a normal developer would faint if they had to read it. So I wanted to hire a developer to make it a bit more better quality and improve some of my implementations of API's that I may have not done correctly. I'm having problems hiring a developer though. I have met 2 developers and had them read my software specs.The problem is, they lacked my business's domain knowledge(which is completely understandable and no biggie) but they also lacked knowledge of the underlying tech systems I used such as Hadoop, Hbase, Cuda, etc..I spent alot of time explaining map/reduce, bigtables and other technologies I used. I thought it was common knowledge because of my interactions with people on this site but the people I met with mentioned they never had to deal with these things so they didn't know it. My question is, for software projects that are hiring contractor developers is it a danger if the developer does not have experience with the underlying technologies? or can a general developer who is accomplished in another area realistically pick up new technologies? I did a very very quick back of envelope calculation and I think the upfront costs would be similar if I hire a student or developer with no experience in my technologies who will work many hours versus hiring a highly experienced developer who charges double but finishes in half the time but what other risks should I be considering or worried about? Also, should if I do hire a generalist, should I be paying for the time it takes them to learn hadoop or cuda if they are contractors(seems to make business sense but not sure how fair it is to them if they do not use the skill again). I'm a bit confused so any suggestions would be great.

Read the article
JVM (embarrasingly) parallel processing libraries/tools

- by Winterstream

I am looking for something that will make it easy to run (correctly coded) embarrassingly parallel JVM code on a cluster (so that I can use Clojure + Incanter). I have used Parallel Python in the past to do this. We have a new PBS cluster and our admin will soon set up IPython nodes that use PBS as the backend. Both of these systems make it almost a no-brainer to run certain types of code in a cluster. I made the mistake of using Hadoop in the past (Hadoop is just not suited to the kind of data that I use) - the latency made even small runs execute for 1-2 minutes. Is JPPF or Gridgain better for what I need? Does anyone here have any experience with either? Is there anything else you can recommend?

Read the article
Running your own GAE server

- by h2g2java

The question http://stackoverflow.com/questions/2505265/how-difficult-is-it-to-migrate-away-from-google-app-engine triggered me to think about this issue again. I have read of someone running, production-wise, Google app engine development version on their own server. My questions are: Are there any security issues running GAE development on your own server in production mode and exposing it to the www? If so how to mitigate them? Can GAE dev be run on Amazon? Is it possible to port my GAE apps running on Google servers to a GAE running on Amazon, without code changes, but without changing any reference in using other gdata services such as google docs, youtube, gmail, etc. How to configure GAE dev server to use my own hadoop? Or to use Amazon's hadoop?

Read the article
MapReduce on Azure

Is there an implementation of MapReduce/Hadoop on Azure?

Read the article
FileInputStream for a generic file System

- by Akhil

I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readObject) in one of the map task. I suppose FileInputStream in = new FileInputStream("hdfs/path/to/file"); wont' work as the file is stored over HDFS. So I thought of using org.apache.hadoop.fs.FileSystem class. But Unfortunately it does not have any method that returns FileInputStream. All it has is a method that returns FSDataInputStream but I want a inputstream that can read serialized java objects like vector from a file rather than just primitive data types that FSDataInputStream would do. Please help!

Read the article
Java interface and abstract class issue

- by George2

Hello everyone, I am reading the book -- Hadoop: The Definitive Guide, http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979/ref=sr_1_1?ie=UTF8&s=books&qid=1273932107&sr=8-1 In chapter 2 (Page 25), it is mentioned "The new API favors abstract class over interfaces, since these are easier to evolve. For example, you can add a method (with a default implementation) to an abstract class without breaking old implementations of the class". What does it mean (especially what means "breaking old implementations of the class")? Appreciate if anyone could show me a sample why from this perspective abstract class is better than interface? thanks in advance, George

Read the article
Third Party Libraries and Technologies very Java Programmer must be aware of?

- by kunjaan

I agree that this is a very subjective question but as a student of Java , I get suggested good libraries and technologies for Java by my mentors at work. For example, I was not aware of Google Guice for Dependency Injection, awesomeness of Java Reflection APIs, ORMs like Hibernate or stuffs you could do with libraries like Hadoop. I want to collect and share some of the libraries that exemplifies good java programming (so that beginners like me could code walk and emulate the coding practice), teach unique concepts to Java (for example Dependency Injections or ORM) and/or are really interesting libraries that a student like me would get to do interesting projects on (eg. Hadoop). I redited this question 3 times to make it more specific : ). I am sorry if I am really not clear in my intentions. But some kind of a list of good concepts and third party libraries for Java could really help some of my intern friends here at work. Thank you.

Read the article
Third Party Libraries and Technologies every Java Programmer must be aware of?

- by kunjaan

I agree that this is a very subjective question but as a student of Java , I get suggested good libraries and technologies for Java by my mentors at work. For example, I was not aware of Google Guice for Dependency Injection, awesomeness of Java Reflection APIs, ORMs like Hibernate or stuffs you could do with libraries like Hadoop. I want to collect and share some of the libraries that exemplifies good java programming (so that beginners like me could code walk and emulate the coding practice), teach unique concepts to Java (for example Dependency Injections or ORM) and/or are really interesting libraries that a student like me would get to do interesting projects on (eg. Hadoop). I redited this question 3 times to make it more specific : ). I am sorry if I am really not clear in my intentions. But some kind of a list of good concepts and third party libraries for Java could really help some of my intern friends here at work. Thank you.

Read the article
Recommendations for distributed processing/distributed storage systems

- by Eddie

At my organization we have a processing and storage system spread across two dozen linux machines that handles over a petabyte of data. The system right now is very ad-hoc; processing automation and data management is handled by a collection of large perl programs on independent machines. I am looking at distributed processing and storage systems to make it easier to maintain, evenly distribute load and data with replication, and grow in disk space and compute power. The system needs to be able to handle millions of files, varying in size between 50 megabytes to 50 gigabytes. Once created, the files will not be appended to, only replaced completely if need be. The files need to be accessible via HTTP for customer download. Right now, processing is automated by perl scripts (that I have complete control over) which call a series of other programs (that I don't have control over because they are closed source) that essentially transforms one data set into another. No data mining happening here. Here is a quick list of things I am looking for: Reliability: These data must be accessible over HTTP about 99% of the time so I need something that does data replication across the cluster. Scalability: I want to be able to add more processing power and storage easily and rebalance the data on across the cluster. Distributed processing: Easy and automatic job scheduling and load balancing that fits with processing workflow I briefly described above. Data location awareness: Not strictly required but desirable. Since data and processing will be on the same set of nodes I would like the job scheduler to schedule jobs on or close to the node that the data is actually on to cut down on network traffic. Here is what I've looked at so far: Storage Management: GlusterFS: Looks really nice and easy to use but doesn't seem to have a way to figure out what node(s) a file actually resides on to supply as a hint to the job scheduler. GPFS: Seems like the gold standard of clustered filesystems. Meets most of my requirements except, like glusterfs, data location awareness. Ceph: Seems way to immature right now. Distributed processing: Sun Grid Engine: I have a lot of experience with this and it's relatively easy to use (once it is configured properly that is). But Oracle got its icy grip around it and it no longer seems very desirable. Both: Hadoop/HDFS: At first glance it looked like hadoop was perfect for my situation. Distributed storage and job scheduling and it was the only thing I found that would give me the data location awareness that I wanted. But I don't like the namename being a single point of failure. Also, I'm not really sure if the MapReduce paradigm fits the type of processing workflow that I have. It seems like you need to write all your software specifically for MapReduce instead of just using Hadoop as a generic job scheduler. OpenStack: I've done some reading on this but I'm having trouble deciding if it fits well with my problem or not. Does anyone have opinions or recommendations for technologies that would fit my problem well? Any suggestions or advise would be greatly appreciated. Thanks!

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
Recap: Oracle Fusion Middleware Strategies Driving Business Innovation

- by Harish Gaur

Hasan Rizvi, Executive Vice President of Oracle Fusion Middleware & Java took the stage on Tuesday to discuss how Oracle Fusion Middleware helps enable business innovation. Through a series of product demos and customer showcases, Hassan demonstrated how Oracle Fusion Middleware is a complete platform to harness the latest technological innovations (cloud, mobile, social and Fast Data) throughout the application lifecycle. Fig 1: Oracle Fusion Middleware is the foundation of business innovation This Session included 4 demonstrations to illustrate these strategies: 1. Build and deploy native mobile applications using Oracle ADF Mobile 2. Empower business user to model processes, design user interface and have rich mobile experience for process interaction using Oracle BPM Suite PS6. 3. Create collaborative user experience and integrate social sign-on using Oracle WebCenter Portal, Oracle WebCenter Content, Oracle Social Network & Oracle Identity Management 11g R2 4. Deploy and manage business applications on Oracle Exalogic Nike, LA Department of Water & Power and Nintendo joined Hasan on stage to share how their organizations are leveraging Oracle Fusion Middleware to enable business innovation. Managing Performance in the Wrld of Social and Mobile How do you provide predictable scalability and performance for an application that monitors active lifestyle of 8 million users on a daily basis? Nike’s answer is Oracle Coherence, a component of Oracle Fusion Middleware and Oracle Exadata. Fig 2: Oracle Coherence enabled data grid improves performance of Nike+ Digital Sports Platform Nicole Otto, Sr. Director of Consumer Digital Technology discussed the vision of the Nike+ platform, a platform which represents a shift for NIKE from a "product" to a "product +" experience. There are currently nearly 8 million users in the Nike+ system who are using digitally-enabled Nike+ devices. Once data from the Nike+ device is transmitted to Nike+ application, users access the Nike+ website or via the Nike mobile applicatoin, seeing metrics around their daily active lifestyle and even engage in socially compelling experiences to compare, compete or collaborate their data with their friends. Nike expects the number of users to grow significantly this year which will drive an explosion of data and potential new experiences. To deal with this challenge, Nike envisioned building a shared platform that would drive a consumer-centric model for the company. Nike built this new platform using Oracle Coherence and Oracle Exadata. Using Coherence, Nike built a data grid tier as a distributed cache, thereby provide low-latency access to most recent and relevant data to consumers. Nicole discussed how Nike+ Digital Sports Platform is unique in the way that it utilizes the Coherence Grid. Nike takes advantage of Coherence as a traditional cache using both cache-aside and cache-through patterns. This new tier has enabled Nike to create a horizontally scalable distributed event-driven processing architecture. Current data grid volume is approximately 150,000 request per minute with about 40 million objects at any given time on the grid. Improving Customer Experience Across Multiple Channels Customer experience is on top of every CIO's mind. Customer Experience needs to be consistent and secure across multiple devices consumers may use. This is the challenge Matt Lampe, CIO of Los Angeles Department of Water & Power (LADWP) was faced with. Despite being the largest utilities company in the country, LADWP had been relying on a 38 year old customer information system for serving its customers. Their prior system had been unable to keep up with growing customer demands. Last year, LADWP embarked on a journey to improve customer experience for 1.6million LA DWP customers using Oracle WebCenter platform. Figure 3: Multi channel & Multi lingual LADWP.com built using Oracle WebCenter & Oracle Identity Management platform Matt shed light on his efforts to drive customer self-service across 3 dimensions – new website, new IVR platform and new bill payment service. LADWP has built a new portal to increase customer self-service while reducing the transactions via IVR. LADWP's website is powered Oracle WebCenter Portal and is accessible by desktop and mobile devices. By leveraging Oracle WebCenter, LADWP eliminated the need to build, format, and maintain individual mobile applications or websites for different devices. Their entire content is managed using Oracle WebCenter Content and secured using Oracle Identity Management. This new portal automated their paper based processes to web based workflows for customers. This includes automation of Self Service implemented through My Account - like Bill Pay, Payment History, Bill History and Usage Analysis. LADWP's solution went live in April 2012. Matt indicated that LADWP's Self-Service Portal has greatly improved customer satisfaction. In a JD Power Associates website satisfaction survey, results indicate rankings have climbed by 25+ points, marking a remarkable increase in user experience. Bolstering Performance and Simplifying Manageability of Business Applications Ingvar Petursson, Senior Vice Preisdent of IT at Nintendo America joined Hasan on-stage to discuss their choice of Exalogic. Nintendo had significant new requirements coming their way for business systems, both internal and external, in the years to come, especially with new products like the WiiU on the horizon this holiday season. Nintendo needed a platform that could give them performance, availability and ease of management as they deploy business systems. Ingvar selected Engineered Systems for two reasons: 1. High performance 2. Ease of management Figure 4: Nintendo relies on Oracle Exalogic to run ATG eCommerce, Oracle e-Business Suite and several business applications Nintendo made a decision to run their business applications (ATG eCommerce, E-Business Suite) and several Fusion Middleware components on the Exalogic platform. What impressed Ingvar was the "stress” testing results during evaluation. Oracle Exalogic could handle their 3-year load estimates for many functions, which was better than Nintendo expected without any hardware expansion. Faster Processing of Big Data Middleware plays an increasingly important role in Big Data. Last year, we announced at OpenWorld the introduction of Oracle Data Integrator for Hadoop and Oracle Loader for Hadoop which helps in the ability to move, transform, load data to and from Big Data Appliance to Exadata. This year, we’ve added new capabilities to find, filter, and focus data using Oracle Event Processing. This product can natively integrate with Big Data Appliance or runs standalone. Hasan briefly discussed how NTT Docomo, largest mobile operator in Japan, leverages Oracle Event Processing & Oracle Coherence to process mobile data (from 13 million smartphone users) at a speed of 700K events per second before feeding it Hadoop for distributed processing of big data. Figure 5: Mobile traffic data processing at NTT Docomo with Oracle Event Processing & Oracle Coherence

Read the article
"Well, Swing took a bit of a beating this week..."

- by Geertjan

One unique aspect of the NetBeans community presence at JavaOne 2012 was its usage of large panels to highlight and discuss various aspects (e.g., Java EE, JavaFX, etc) of NetBeans IDE usage and tools. For example, here's a pic of one of the panels, taken by Markus Eisele: Above you see me, Sean Comerford from ESPN.com, Gerrick Bivins from Halliburton, Angelo D'Agnano and Ioannis Kostaras from the NATO Programming Center, and Çagatay Çivici from PrimeFaces. (And Tinu Awopetu was also on the panel but not in the picture!) On one of those panels a remark was made which has kind of stuck with me. Henry Arousell, a member of the "NetBeans Platform Discussion Panel", who works on accounting software in Sweden, together with Thomas Boqvist, who was also at JavaOne, said, a bit despondently, I thought, the following words at the start of the demo of his very professional looking accounting software: "Well, Swing took a bit of a beating this week..." That remark comes in the light of several JavaFX sessions held at JavaOne, together with many sessions from the web and mobile worlds making the argument that the browser, tablet, and mobile platforms are the future of all applications everywhere. However, then I had another look at the list of Duke's Choice Award winners: http://www.oracle.com/us/corporate/press/1854931 OK, there are 10 winners of the Duke's Choice Award this year. Three of them (JDuchess, London Java Community, Student Nokia Developer Group) are not awards for software, but for people or groups. So, that leaves seven awards. Three of them (Hadoop, Jelastic, and Parleys) are, in one way or another, some kind of web-oriented solution, though both Hadoop and Jelastic are broader than that, but are service-oriented solutions, relating to cloud technologies. That leaves four others: NATO air defense software, Liquid Robotics software, AgroSense software, and UNHCR Refugee Registration software. All these are, on the software level, Java desktop solutions that, on the UI layer, make use of Java Swing, together with LuciadMaps (NATO), GeoToolkit (AgroSense), and WorldWind (Liquid Robotics). (And, it went even further than that, i.e., this is not passive usage of Swing but active and motivated: Timon Veenstra, during his AgroSense demo, said "There are far more Swing applications out there than we seem to think. Web developers just make more noise." And, during his Liquid Robotics demo, James Gosling said: "Not everything can be done in HTML.") Seems to me that Java Swing was the enabler of more Duke's Choice Award winners this year than any other UI-oriented Java technology. Now, I'm not going to interpret that one way or another, since I've noticed that interpretations of facts tend to validate some underlying agenda. Take any fact anywhere and you can interpret it to prove whatever opinion you're already holding to be true. Therefore, no interpretation from me. Simply stating the fact that Swing, far from taking a beating during JavaOne 2012, was a more significant user interface enabler of Duke's Choice Award winners than any other Java user interface technology. That's not an interpretation, but a fact.

Read the article
Oracle OpenWorld 2013 – Wrap up by Sven Bernhardt

- by JuergenKress

OOW 2013 is over and we’re heading home, so it is time to lean back and reflecting about the impressions we have from the conference. First of all: OOW was great! It was a pleasure to be a part of it. As already mentioned in our last blog article: It was the biggest OOW ever. Parallel to the conference the America’s Cup took place in San Francisco and the Oracle Team America won. Amazing job by the team and again congratulations from our side Back to the conference. The main topics for us are: Oracle SOA / BPM Suite 12c Adaptive Case management (ACM) Big Data Fast Data Cloud Mobile Below we will go a little more into detail, what are the key takeaways regarding the mentioned points: Oracle SOA / BPM Suite 12c During the five days at OOW, first details of the upcoming major release of Oracle SOA Suite 12c and Oracle BPM Suite 12c have been introduced. Some new key features are: Managed File Transfer (MFT) for transferring big files from a source to a target location Enhanced REST support by introducing a new REST binding Introduction of a generic cloud adapter, which can be used to connect to different cloud providers, like Salesforce Enhanced analytics with BAM, which has been totally reengineered (BAM Console now also runs in Firefox!) Introduction of templates (OSB pipelines, component templates, BPEL activities templates) EM as a single monitoring console OSB design-time integration into JDeveloper (Really great!) Enterprise modeling capabilities in BPM Composer These are only a few points from what is coming with 12c. We are really looking forward for the new realese to come out, because this seems to be really great stuff. The suite becomes more and more integrated. From 10g to 11g it was an evolution in terms of developing SOA-based applications. With 12c, Oracle continues it’s way – very impressive. Adaptive Case Management Another fantastic topic was Adaptive Case Management (ACM). The Oracle PMs did a great job especially at the demo grounds in showing the upcoming Case Management UI (will be available in 11g with the next BPM Suite MLR Patch), the roadmap and the differences between traditional business process modeling. They have been very busy during the conference because a lot of partners and customers have been interested Big Data Big Data is one of the current hype themes. Because of huge data amounts from different internal or external sources, the handling of these data becomes more and more challenging. Companies have a need for analyzing the data to optimize their business. The challenge is here: the amount of data is growing daily! To store and analyze the data efficiently, it is necessary to have a scalable and flexible infrastructure. Here it is important that hardware and software are engineered to work together. Therefore several new features of the Oracle Database 12c, like the new in-memory option, have been presented by Larry Ellison himself. From a hardware side new server machines like Fujitsu M10 or new processors, such as Oracle’s new M6-32 have been announced. The performance improvements, when using one of these hardware components in connection with the improved software solutions were really impressive. For more details about this, please take look at our previous blog post. Regarding Big Data, Oracle also introduced their Big Data architecture, which consists of: Oracle Big Data Appliance that is preconfigured with Hadoop Oracle Exdata which stores a huge amount of data efficently, to achieve optimal query performance Oracle Exalytics as a fast and scalable Business analytics system Analysis of the stored data can be performed using SQL, by streaming the data directly from Hadoop to an Oracle Database 12c. Alternatively the analysis can be directly implemented in Hadoop using “R”. In addition Oracle BI Tools can be used to analyze the data. Fast Data Fast Data is a complementary approach to Big Data. A huge amount of mostly unstructured data comes in via different channels with a high frequency. The analysis of these data streams is also important for companies, because the incoming data has to be analyzed regarding business-relevant patterns in real-time. Therefore these patterns must be identified efficiently and performant. To do so, in-memory grid solutions in combination with Oracle Coherence and Oracle Event Processing demonstrated very impressive how efficient real-time data processing can be. One example for Fast Data solutions that was shown during the OOW was the analysis of twitter streams regarding customer satisfaction. The feeds with negative words like “bad” or “worse” have been filtered and after a defined treshold has been reached in a certain timeframe, a business event was triggered. Cloud Another key trend in the IT market is of course Cloud Computing and what it means for companies and their businesses. Oracle announced their Cloud strategy and vision – companies can focus on their real business while all of the applications are available via Cloud. This also includes Oracle Database or Oracle Weblogic, so that companies can also build, deploy and run their own applications within the cloud. Three different approaches have been introduced: Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Using the IaaS approach only the infrastructure components will be managed in the Cloud. Customers will be very flexible regarding memory, storage or number of CPUs because those parameters can be adjusted elastically. The PaaS approach means that besides the infrastructure also the platforms (such as databases or application servers) necessary for running applications will be provided within the Cloud. Here customers can also decide, if installation and management of these infrastructure components should be done by Oracle. The SaaS approach describes the most complete one, hence all applications a company uses are managed in the Cloud. Oracle is planning to provide all of their applications, like ERP systems or HR applications, as Cloud services. In conclusion this seems to be a very forward-thinking strategy, which opens up new possibilities for customers to manage their infrastructure and applications in a flexible, scalable and future-oriented manner. As you can see, our OOW days have been very very interresting. We collected many helpful informations for our projects. The new innovations presented at the confernce are great and being part of this was even greater! We are looking forward to next years’ conference! Links: http://www.oracle.com/openworld/index.html http://thecattlecrew.wordpress.com/2013/09/23/first-impressions-from-oracle-open-world-2013 SOA & BPM Partner Community For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Facebook Wiki Mix Forum Technorati Tags: cattleCrew,Sven Bernhard,OOW2013,SOA Community,Oracle SOA,Oracle BPM,Community,OPN,Jürgen Kress

Read the article
Big Data Videos

- by Jean-Pierre Dijcks

You can view them all on YouTube using the following links: Overview for the Boss: http://youtu.be/ikJyrmKdJWc Hadoop: http://youtu.be/acWtid-OOWM Acquiring Big Data: http://youtu.be/TfuhuA_uaho Organizing Big Data: http://youtu.be/IC6jVRO2Hq4 Analyzing Big Data: http://youtu.be/2yf_jrBhz5w These videos are a great place to start learning about big data, the value it can bring to your organisation and how Oracle can help you start working with big data today.

Read the article
SQLAuthority News – Download Whitepaper – SQL Server Analysis Services to Hive

- by pinaldave

The SQL Server Analysis Service is a very interesting subject and I always have enjoyed learning about it. You can read my earlier article over here. Big Data is my new interest and I have been exploring it recently. During this weekend this blog post caught my attention and I enjoyed reading it. Big Data is the next big thing. The growth is predicted to be 60% per year till 2016. There is no single solution to the growing need of the big data available in the market right now as well there is no one solution in the business intelligence eco-system available as well. However, the need of the solution is ever increasing. I am personally Klout user. You can see my Klout profile over. I do understand what Klout is trying to achieve – a single place to measure the influence of the person. However, it works a bit mysteriously. There are plenty of social media available currently in the internet world. The biggest problem all the social media faces is that everybody opens an account but hardly people logs back in. To overcome this issue and have returned visitors Klout has come up with the system where visitors can give 5/10 K+ to other users in a particular area. Looking at all the activities Klout is doing it is indeed big consumer of the Big Data as well it is early adopter of the big data and Hadoop based system. Klout has to 1 trillion rows of data to be analyzed as well have nearly thousand terabyte warehouse. Hive the language used for Big Data supports Ad-Hoc Queries using HiveQL there are always better solutions. The alternate solution would be using SQL Server Analysis Services (SSAS) along with HiveQL. As there is no direct method to achieve there are few common workarounds already in place. A new ODBC driver from Klout has broken through the limitation and SQL Server Relation Engine can be used as an intermediate stage before SSAS. In this white paper the same solutions have been discussed in the depth. The white paper discusses following important concepts. The Klout Big Data solution Big Data Analytics based on Analysis Services Hadoop/Hive and Analysis Services integration Limitations of direct connectivity Pass-through queries to linked servers Best practices and lessons learned This white paper discussed all the important concepts which have enabled Klout to go go to the next level with all the offerings as well helped efficiency by offering a few out of the box solutions. I personally enjoy reading this white paper and I encourage all of you to do so. SQL Server Analysis Services to Hive Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL White Papers, T SQL, Technology

Read the article
MapRedux - PowerShell and Big Data

- by Dittenhafer Solutions

MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

Read the article
ArchBeat Link-o-Rama for 2012-06-28

- by Bob Rhubart

Oracle Magazine Technologist of the Year Awards to honor architects at #OOW12 Seven of the ten categories in this year's Oracle Magazine Technologist of the Year Awards are designated to celebrate architects. The winners will be honored at Oracle OpenWorld -- and showered with adulation from their colleagues. Nominations for these awards close on Tuesday July 17, so make sure you submit your nominations right away. Oracle E-Business Suite 12 Certified on Additional Linux Platforms (Oracle E-Business Suite Technology) Oracle E-Business Suite Release 12 (12.1.1 and higher) is now certified on the following additional Linux x86/x86-64 operating systems: Oracle Linux 6 (32-bit), Red Hat Enterprise Linux 6 (32-bit), Red Hat Enterprise Linux 6 (64-bit), and Novell SUSE Linux Enterprise Server (SLES) version 11 (64-bit). FairScheduling Conventions in Hadoop (The Data Warehouse Insider)"If you're going to have several concurrent users and leverage the more interactive aspects of the Hadoop environment (e.g. Pig and Hive scripting), the FairScheduler is definitely the way to go," says Dan McClary. Learn how in his technical post. SOA Learning Library (SOA & BPM Partner Community Blog) The Oracle Learning Library offers a vast collection of e-learning resources covering a mind-boggling array of products and topics. And it's all free—if you have an Oracle.com membership. And if you don't, that's free, too. Could this be any easier? Oracle Fusion Middleware Security: LibOVD: when and how | Andre Correa Fusion Middleware A-Team blogger Andre Correa offers some background on LibOVD and shares technical tips for its use. Virtual Developer Day: Oracle Fusion Development Yes, it's called "Developer Day," but there's plenty for architects, too. This free event includes hands-on labs, live Q&A with product experts, and a dizzying amount of technical information about Oracle ADF and Fusion Development -- all without having to pack a bag or worry about getting stuck in a seat between two professional wrestlers. Tuesday, July 10, 2012 9:00 a.m. PT – 1:00 p.m. PT 11:00 a.m. CT – 3:00 p.m. CT 12:00 p.m. ET – 4:00 p.m. ET 1:00 p.m. BRT – 5:00 p.m. BRT Thought for the Day "Computers allow you to make more mistakes faster than any other invention in human history with the possible exception of handguns and tequila." — Mitch Ratcliffe Source: SoftwareQuotes.com

Read the article
Cloud Computing: Start with the problem

- by BuckWoody

At one point in my life I would build my own computing system for home use. I wanted a particular video card, a certain set of drives, and a lot of memory. Not only could I not find those things in a vendor’s pre-built computer, but those were more expensive – by a lot. As time moved on and the computing industry matured, I actually find that I can buy a vendor’s system as cheaply – and in some cases far more cheaply – than I can build it myself. This paradigm holds true for almost any product, even clothing and furniture. And it’s also held true for software… Mostly. If you need an office productivity package, you simply buy one or use open-sourced software for that. There’s really no need to write your own Word Processor – it’s kind of been done a thousand times over. Even if you need a full system for customer relationship management or other needs, you simply buy one. But there is no “cloud solution in a box”. Sure, if you’re after “Software as a Service” – type solutions, like being able to process video (Windows Azure Media Services) or running a Pig or Hive job in Hadoop (Hadoop on Windows Azure) you can simply use one of those, or if you just want to deploy a Virtual Machine (Windows Azure Virtual Machines) you can get that, but if you’re looking for a solution to a problem your organization has, you may need to mix Software, Infrastructure, and perhaps even Platforms (such as Windows Azure Computing) to solve the issue. It’s all about starting from the problem-end first. We’ve become so accustomed to looking for a box of software that will solve the problem, that we often start with the solution and try to fit it to the problem, rather than the other way around. When I talk with my fellow architects at other companies, one of the hardest things to get them to do is to ignore the technology for a moment and describe what the issues are. It’s interesting to monitor the conversation and watch how many times we deviate from the problem into the solution. So, in your work today, try a little experiment: watch how many times you go after a problem by starting with the solution. Tomorrow, make a conscious effort to reverse that. You might be surprised at the results.

Read the article
Oracle Social Analytics with the Big Data Appliance

- by thegreeneman

Found an awesome demo put together by one of the Oracle NoSQL Database partners, eDBA, on using the Big Data Appliance to do social analytics. In this video, James Anthony is showing off the BDA, Hadoop, the Oracle Big Data Connectors and how they can be used and integrated with the Oracle Database to do an end-to-end sentiment analysis leveraging twitter data. A really great demo worth the view.

Read the article
Big Data – Buzz Words: What is MapReduce – Day 7 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Though, MapReduce was originally Google proprietary technology, it has been quite a generalized term in the recent time. MapReduce comprises a Map() and Reduce() procedures. Procedure Map() performance filtering and sorting operation on data where as procedure Reduce() performs a summary operation of the data. This model is based on modified concepts of the map and reduce functions commonly available in functional programing. The library where procedure Map() and Reduce() belongs is written in many different languages. The most popular free implementation of MapReduce is Apache Hadoop which we will explore tomorrow. Advantages of MapReduce Procedures The MapReduce Framework usually contains distributed servers and it runs various tasks in parallel to each other. There are various components which manages the communications between various nodes of the data and provides the high availability and fault tolerance. Programs written in MapReduce functional styles are automatically parallelized and executed on commodity machines. The MapReduce Framework takes care of the details of partitioning the data and executing the processes on distributed server on run time. During this process if there is any disaster the framework provides high availability and other available modes take care of the responsibility of the failed node. As you can clearly see more this entire MapReduce Frameworks provides much more than just Map() and Reduce() procedures; it provides scalability and fault tolerance as well. A typical implementation of the MapReduce Framework processes many petabytes of data and thousands of the processing machines. How do MapReduce Framework Works? A typical MapReduce Framework contains petabytes of the data and thousands of the nodes. Here is the basic explanation of the MapReduce Procedures which uses this massive commodity of the servers. Map() Procedure There is always a master node in this infrastructure which takes an input. Right after taking input master node divides it into smaller sub-inputs or sub-problems. These sub-problems are distributed to worker nodes. A worker node later processes them and does necessary analysis. Once the worker node completes the process with this sub-problem it returns it back to master node. Reduce() Procedure All the worker nodes return the answer to the sub-problem assigned to them to master node. The master node collects the answer and once again aggregate that in the form of the answer to the original big problem which was assigned master node. The MapReduce Framework does the above Map () and Reduce () procedure in the parallel and independent to each other. All the Map() procedures can run parallel to each other and once each worker node had completed their task they can send it back to master code to compile it with a single answer. This particular procedure can be very effective when it is implemented on a very large amount of data (Big Data). The MapReduce Framework has five different steps: Preparing Map() Input Executing User Provided Map() Code Shuffle Map Output to Reduce Processor Executing User Provided Reduce Code Producing the Final Output Here is the Dataflow of MapReduce Framework: Input Reader Map Function Partition Function Compare Function Reduce Function Output Writer In a future blog post of this 31 day series we will explore various components of MapReduce in Detail. MapReduce in a Single Statement MapReduce is equivalent to SELECT and GROUP BY of a relational database for a very large database. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – HDFS. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Markus Zirn, "Big Data with CEP and SOA" @ SOA, Cloud & Service Technology Symposium 2012

- by JuergenKress

ORACLE PROMOTIONAL DISCOUNT FOR EXCLUSIVE ORACLE DISCOUNT, ENTER PROMO CODE: DJMXZ370 Early-Bird Registration is Now Open with Special Pricing! Register before July 1, 2012 to qualify for discounts. Visit the Registration page for details. The International SOA, Cloud + Service Technology Symposium is a yearly event that features the top experts and authors from around the world, providing a series of keynotes, talks, demonstrations, and panels, as well as training and certification workshops - all dedicated to empowering IT professionals to realize modern service technologies and practices in the real world. Click here for a two-page printable conference overview (PDF). Big Data with CEP and SOA - September 25, 2012 - 14:15 Speaker: Markus Zirn, Oracle and Baz Kuthi, Avocent The "Big Data" trend is driving new kinds of IT projects that process machine-generated data. Such projects store and mine using Hadoop/ Map Reduce, but they also analyze streaming data via event-driven patterns, which can be called "Fast Data" complementary to "Big Data". This session highlights how "Big Data" and "Fast Data" design patterns can be combined with SOA design principles into modern, event-driven architectures. We will describe specific architectures that combines CEP, Distributed Caching, Event-driven Network, SOA Composites, Application Development Framework, as well as Hadoop. Architecture patterns include pre-processing and filtering event streams as close as possible to the event source, in memory master data for event pattern matching, event-driven user interfaces as well as distributed event processing. Focus is on how "Fast Data" requirements are elegantly integrated into a traditional SOA architecture. Markus Zirn is Vice President of Product Management covering Oracle SOA Suite, SOA Governance, Application Integration Architecture, BPM, BPM Solutions, Complex Event Processing and UPK, an end user learning solution. He is the author of “The BPEL Cookbook” (rated best book on Services Oriented Architecture in 2007) as well as “Fusion Middleware Patterns”. Previously, he was a management consultant with Booz Allen & Hamilton’s High Tech practice in Duesseldorf as well as San Francisco and Vice President of Product Marketing at QUIQ. Mr. Zirn holds a Masters of Electrical Engineering from the University of Karlsruhe and is an alumnus of the Tripartite program, a joint European degree from the University of Karlsruhe, Germany, the University of Southampton, UK, and ESIEE, France. KEYNOTES & SPEAKERS More than 80 international subject matter experts will be speaking at the Symposium. Below are confirmed keynotes and speakers so far. Over 50% of the agenda has not yet been finalized. Many more speakers to come. View the partial program calendars on the Conference Agenda page. CONFERENCE THEMES & TRACKS Cloud Computing Architecture & Patterns New SOA & Service-Orientation Practices & Models Emerging Service Technology Innovation Service Modeling & Analysis Techniques Service Infrastructure & Virtualization Cloud-based Enterprise Architecture Business Planning for Cloud Computing Projects Real World Case Studies Semantic Web Technologies (with & without the Cloud) Governance Frameworks for SOA and/or Cloud Computing Projects Service Engineering & Service Programming Techniques Interactive Services & the Human Factor New REST & Web Services Tools & Techniques Oracle Specialized SOA & BPM Partners Oracle Specialized partners have proven their skills by certifications and customer references. To find a local Specialized partner please visit http://solutions.oracle.com SOA & BPM Partner Community For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Technorati Tags: Markus Zirn,SOA Symposium,Thomas Erl,SOA Community,Oracle SOA,Oracle BPM,BPM Community,OPN,Jürgen Kress

Read the article
Java Spotlight Episode 103: 2012 Duke Choice Award Winners

- by Roger Brinkley

Our annual interview with the 2012 Duke Choice Award Winners recorded live at the JavaOne 2012. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link: Java Spotlight Podcast in iTunes. Show Notes Events Oct 13, Devoxx 4 Kids Nederlands Oct 15-17, JAX London Oct 20, Devoxx 4 Kids Français Oct 22-23, Freescale Technology Forum - Japan, Tokyo Oct 30-Nov 1, Arm TechCon, Santa Clara Oct 31, JFall, Netherlands Nov 2-3, JMagreb, Morocco Nov 13-17, Devoxx, Belgium Feature Interview Duke Choice Award Winners 2012 - Show Presentation London Java CommunityThe second user group receiving a Duke’s Choice Award this year, the London Java Community (LJC) and its users have been active in the OpenJDK, the Java Community Process (JCP) and other efforts within the global Java community. Student Nokia Developer GroupThis year’s student winner, Ram Kashyap, is the founder and president of the Nokia Student Network, and was profiled in the “The New Java Developers” feature in the March/April 2012 issue of Java Magazine. Since then, Ram has maintained a hectic pace, graduating from the People’s Education Society Institute of Technology in Bangalore, India, while working on a Java mobile startup and training students on Java ME. Jelastic, Inc.Moving existing Java applications to the cloud can be a daunting task, but startup Jelastic, Inc. offers the first all-Java platform-as-a-service (PaaS) that enables existing Java applications to be deployed in the cloud without code changes or lock-in. NATOThe first-ever Community Choice Award goes to the MASE Integrated Console Environment (MICE) in use at NATO. Built in Java on the NetBeans platform, MICE provides a high-performance visualization environment for conducting air defense and battle-space operations. DuchessRather than focus on a specific geographic area like most Java User Groups (JUGs), Duchess fosters the participation of women in the Java community worldwide. The group has more than 500 members in 60 countries, and provides a platform through which women can connect with each other and get involved in all aspects of the Java community. AgroSense ProjectImproving farming methods to feed a hungry world is the goal of AgroSense, an open source farm information management system built in Java and the NetBeans platform. AgroSense enables farmers, agribusinesses, suppliers and others to develop modular applications that will easily exchange information through a common underlying NetBeans framework. Apache Software Foundation Hadoop ProjectThe Apache Software Foundation’s Hadoop project, written in Java, provides a framework for distributed processing of big data sets across clusters of computers, ranging from a few servers to thousands of machines. This harnessing of large data pools allows organizations to better understand and improve their business. Parleys.comE-learning specialist Parleys.com, based in Brussels, Belgium, uses Java technologies to bring online classes and full IT conferences to desktops, laptops, tablets and mobile devices. Parleys.com has hosted more than 1,700 conferences—including Devoxx and JavaOne—for more than 800,000 unique visitors. Winners not presenting at JavaOne 2012 Duke Choice Awards BOF Liquid RoboticsRobotics – Liquid Robotics is an ocean data services provider whose Wave Glider technology collects information from the world’s oceans for application in government, science and commercial applications. The organization features the “father of Java” James Gosling as its chief software architect.United Nations High Commissioner for RefugeesThe United Nations High Commissioner for Refugees (UNHCR) is on the front lines of crises around the world, from civil wars to natural disasters. To help facilitate its mission of humanitarian relief, the UNHCR has developed a light-client Java application on the NetBeans platform. The Level One registration tool enables the UNHCR to collect information on the number of refugees and their water, food, housing, health, and other needs in the field, and combines that with geocoding information from various sources. This enables the UNHCR to deliver the appropriate kind and amount of assistance where it is needed.

Read the article

Search Results

Search found 401 results on 17 pages for 'hadoop'.

Page 11/17 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

- by dbayard

- by linker

- by dan.mcclary

- by Lostsoul

- by Winterstream

- by h2g2java

- by Akhil

- by George2

- by kunjaan

- by kunjaan

- by Eddie

- by BuckWoody

- by Harish Gaur

- by Geertjan

- by JuergenKress

- by Jean-Pierre Dijcks

- by pinaldave

- by Dittenhafer Solutions

- by Bob Rhubart

- by BuckWoody

- by thegreeneman

- by Pinal Dave

- by JuergenKress

- by Roger Brinkley

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 | Next Page >