elephant - Page 3 - Developer IT

Running a simple integration scenario using the Oracle Big Data Connectors on Hadoop/HDFS cluster

- by hamsun

Between the elephant ( the tradional image of the Hadoop framework) and the Oracle Iron Man (Big Data..) an english setter could be seen as the link to the right data Data, Data, Data, we are living in a world where data technology based on popular applications , search engines, Webservers, rich sms messages, email clients, weather forecasts and so on, have a predominant role in our life. More and more technologies are used to analyze/track our behavior, try to detect patterns, to propose us "the best/right user experience" from the Google Ad services, to Telco companies or large consumer sites (like Amazon:) ). The more we use all these technologies, the more we generate data, and thus there is a need of huge data marts and specific hardware/software servers (as the Exadata servers) in order to treat/analyze/understand the trends and offer new services to the users. Some of these "data feeds" are raw, unstructured data, and cannot be processed effectively by normal SQL queries. Large scale distributed processing was an emerging infrastructure need and the solution seemed to be the "collocation of compute nodes with the data", which in turn leaded to MapReduce parallel patterns and the development of the Hadoop framework, which is based on MapReduce and a distributed file system (HDFS) that runs on larger clusters of rather inexpensive servers. Several Oracle products are using the distributed / aggregation pattern for data calculation ( Coherence, NoSql, times ten ) so once that you are familiar with one of these technologies, lets says with coherence aggregators, you will find the whole Hadoop, MapReduce concept very similar. Oracle Big Data Appliance is based on the Cloudera Distribution (CDH), and the Oracle Big Data Connectors can be plugged on a Hadoop cluster running the CDH distribution or equivalent Hadoop clusters. In this paper, a "lab like" implementation of this concept is done on a single Linux X64 server, running an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0, and a single node Apache hadoop-1.2.1 HDFS cluster, using the SQL connector for HDFS. The whole setup is fairly simple: Install on a Linux x64 server ( or virtual box appliance) an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 server Get the Apache Hadoop distribution from: http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-1.2.1. Get the Oracle Big Data Connectors from: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html?ssSourceSiteId=ocomen. Check the java version of your Linux server with the command: java -version java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) Decompress the hadoop hadoop-1.2.1.tar.gz file to /u01/hadoop-1.2.1 Modify your .bash_profile export HADOOP_HOME=/u01/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin export HIVE_HOME=/u01/hive-0.11.0 export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin (also see my sample .bash_profile) Set up ssh trust for Hadoop process, this is a mandatory step, in our case we have to establish a "local trust" as will are using a single node configuration copy the new public keys to the list of authorized keys connect and test the ssh setup to your localhost: We will run a "pseudo-Hadoop cluster", in what is called "local standalone mode", all the Hadoop java components are running in one Java process, this is enough for our demo purposes. We need to "fine tune" some Hadoop configuration files, we have to go at our $HADOOP_HOME/conf, and modify the files: core-site.xml hdfs-site.xml mapred-site.xml check that the hadoop binaries are referenced correctly from the command line by executing: hadoop -version As Hadoop is managing our "clustered HDFS" file system we have to create "the mount point" and format it , the mount point will be declared to core-site.xml as: The layout under the /u01/hadoop-1.2.1/data will be created and used by other hadoop components (MapReduce = /mapred/...) HDFS is using the /dfs/... layout structure format the HDFS hadoop file system: Start the java components for the HDFS system As an additional check, you can use the GUI Hadoop browsers to check the content of your HDFS configurations: Once our HDFS Hadoop setup is done you can use the HDFS file system to store data ( big data : )), and plug them back and forth to Oracle Databases by the means of the Big Data Connectors ( which is the next configuration step). You can create / use a Hive db, but in our case we will make a simple integration of "raw data" , through the creation of an External Table to a local Oracle instance ( on the same Linux box, we run the Hadoop HDFS one node cluster and one Oracle DB). Download some public "big data", I use the site: http://france.meteofrance.com/france/observations, from where I can get *.csv files for my big data simulations :). Here is the data layout of my example file: Download the Big Data Connector from the OTN (oraosch-2.2.0.zip), unzip it to your local file system (see picture below) Modify your environment in order to access the connector libraries , and make the following test: [oracle@dg1 bin]$./hdfs_stream Usage: hdfs_stream locationFile [oracle@dg1 bin]$ Load the data to the Hadoop hdfs file system: hadoop fs -mkdir bgtest_data hadoop fs -put obsFrance.txt bgtest_data/obsFrance.txt hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$ hadoop fs -ls /user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt [oracle@dg1 bg-data-raw]$hadoop fs -ls hdfs:///user/oracle/bgtest_data/obsFrance.txt Found 1 items -rw-r--r-- 1 oracle supergroup 54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt Check the content of the HDFS with the browser UI: Start the Oracle database, and run the following script in order to create the Oracle database user, the Oracle directories for the Oracle Big Data Connector (dg1 it’s my own db id replace accordingly yours): #!/bin/bash export ORAENV_ASK=NO export ORACLE_SID=dg1 . oraenv sqlplus /nolog <<EOF CONNECT / AS sysdba; CREATE OR REPLACE DIRECTORY osch_bin_path AS '/u01/orahdfs-2.2.0/bin'; CREATE USER BGUSER IDENTIFIED BY oracle; GRANT CREATE SESSION, CREATE TABLE TO BGUSER; GRANT EXECUTE ON sys.utl_file TO BGUSER; GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO BGUSER; CREATE OR REPLACE DIRECTORY BGT_LOG_DIR as '/u01/BG_TEST/logs'; GRANT READ, WRITE ON DIRECTORY BGT_LOG_DIR to BGUSER; CREATE OR REPLACE DIRECTORY BGT_DATA_DIR as '/u01/BG_TEST/data'; GRANT READ, WRITE ON DIRECTORY BGT_DATA_DIR to BGUSER; EOF Put the following in a file named t3.sh and make it executable, hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -D oracle.hadoop.exttab.tableName=BGTEST_DP_XTAB \ -D oracle.hadoop.exttab.defaultDirectory=BGT_DATA_DIR \ -D oracle.hadoop.exttab.dataPaths="hdfs:///user/oracle/bgtest_data/obsFrance.txt" \ -D oracle.hadoop.exttab.columnCount=7 \ -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/dg1 \ -D oracle.hadoop.connection.user=BGUSER \ -D oracle.hadoop.exttab.printStackTrace=true \ -createTable --noexecute then test the creation fo the external table with it: [oracle@dg1 samples]$ ./t3.sh ./t3.sh: line 2: /u01/orahdfs-2.2.0: Is a directory Oracle SQL Connector for HDFS Release 2.2.0 - Production Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved. Enter Database Password:] The create table command was not executed. The following table would be created. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081035-74-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files would be created. osch-20131022081035-74-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt Then remove the --noexecute flag and create the external Oracle table for the Hadoop data. Check the results: The create table command succeeded. CREATE TABLE "BGUSER"."BGTEST_DP_XTAB" ( "C1" VARCHAR2(4000), "C2" VARCHAR2(4000), "C3" VARCHAR2(4000), "C4" VARCHAR2(4000), "C5" VARCHAR2(4000), "C6" VARCHAR2(4000), "C7" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "BGT_DATA_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 STRING SIZES ARE IN CHARACTERS PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'2C' MISSING FIELD VALUES ARE NULL ( "C1" CHAR(4000), "C2" CHAR(4000), "C3" CHAR(4000), "C4" CHAR(4000), "C5" CHAR(4000), "C6" CHAR(4000), "C7" CHAR(4000) ) ) LOCATION ( 'osch-20131022081719-3239-1' ) ) PARALLEL REJECT LIMIT UNLIMITED; The following location files were created. osch-20131022081719-3239-1 contains 1 URI, 54103 bytes 54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt This is the view from the SQL Developer: and finally the number of lines in the oracle table, imported from our Hadoop HDFS cluster SQL select count(*) from "BGUSER"."BGTEST_DP_XTAB"; COUNT(*) ---------- 1151 In a next post we will integrate data from a Hive database, and try some ODI integrations with the ODI Big Data connector. Our simplistic approach is just a step to show you how these unstructured data world can be integrated to Oracle infrastructure. Hadoop, BigData, NoSql are great technologies, they are widely used and Oracle is offering a large integration infrastructure based on these services. Oracle University presents a complete curriculum on all the Oracle related technologies: NoSQL: Introduction to Oracle NoSQL Database Using Oracle NoSQL Database Big Data: Introduction to Big Data Oracle Big Data Essentials Oracle Big Data Overview Oracle Data Integrator: Oracle Data Integrator 12c: New Features Oracle Data Integrator 11g: Integration and Administration Oracle Data Integrator: Administration and Development Oracle Data Integrator 11g: Advanced Integration and Development Oracle Coherence 12c: Oracle Coherence 12c: New Features Oracle Coherence 12c: Share and Manage Data in Clusters Oracle Coherence 12c: Oracle GoldenGate 11g: Fundamentals for Oracle Oracle GoldenGate 11g: Fundamentals for SQL Server Oracle GoldenGate 11g Fundamentals for Oracle Oracle GoldenGate 11g Fundamentals for DB2 Oracle GoldenGate 11g Fundamentals for Teradata Oracle GoldenGate 11g Fundamentals for HP NonStop Oracle GoldenGate 11g Management Pack: Overview Oracle GoldenGate 11g Troubleshooting and Tuning Oracle GoldenGate 11g: Advanced Configuration for Oracle Other Resources: Apache Hadoop : http://hadoop.apache.org/ is the homepage for these technologies. "Hadoop Definitive Guide 3rdEdition" by Tom White is a classical lecture for people who want to know more about Hadoop , and some active "googling " will also give you some more references. About the author: Eugene Simos is based in France and joined Oracle through the BEA-Weblogic Acquisition, where he worked for the Professional Service, Support, end Education for major accounts across the EMEA Region. He worked in the banking sector, ATT, Telco companies giving him extensive experience on production environments. Eugen currently specializes in Oracle Fusion Middleware teaching an array of courses on Weblogic/Webcenter, Content,BPM /SOA/Identity-Security/GoldenGate/Virtualisation/Unified Comm Suite) throughout the EMEA region.

Read the article

Looking Back at MIX10

- by WeigeltRo

It’s the sad truth of my life that even though I’m fascinated by airplanes and flight in general since my childhood days, my body doesn’t like flying. Even the ridiculously short flights inside Germany are taking their toll on me each time. Now combine this with sitting in the cramped space of economy class for many hours on a transatlantic flight from Germany to Las Vegas and back, and factor in some heavy dose of jet lag (especially on my way eastwards), and you get an idea why after coming back home I had this question on my mind: Was it really worth it to attend MIX10? This of course is a question that will also be asked by my boss at Comma Soft (for other reasons, obviously), who decided to send me and my colleague Jens Schaller, to the MIX10 conference. (A note to my German readers: An dieser Stelle der Hinweis, dass Comma Soft noch Silverlight-Entwickler und/oder UI-Designer für den Standort Bonn sucht – aussagekräftige Bewerbungen bitte an [email protected]) Too keep things short: My answer is yes. Before I’ll go into detail, let me ask the heretical questions whether tech conferences in general still make sense. There was a time, where actually being at a tech conference gave you a head-start in regard to learning about new technologies. Nowadays this is no longer true, where every bit of information and every detail is immediately twittered, blogged and whatevered to death. In the case of MIX10 you even can download the video-taped sessions shortly after. So: Does visiting a conference still make sense? It depends on what you expect from a conference. It should be clear to everybody that you’ll neither get exclusive information, nor receive training in a small group. What a conference does offer that sitting in front of your computer does not can be summarized as follows: Focus Being away from work and home will help you to focus on the presented information. Of course there are always the poor guys who are haunted by their work (with mails and short text messages reporting the latest showstopper problem), but in general being out of your office makes a huge difference. Inspiration With the focus comes the emotional involvement. I find it much easier to absorb information if I feel that certain vibe when sitting in a session. This still means that I have put work into reviewing the information later, but it’s a better starting point. And all the impressions collected at a (good) conference combined lead to a higher motivation – be it by the buzz (“this is gonna be sooo cool!”) or by the fear to fall behind (“man, we’ll have work on this, or else…”). People At a conference it’s pretty easy to get into contact with other people during breakfast, lunch and other breaks. This is a good opportunity to get a feel for what other development teams are doing (on a very general level of course, nobody will tell you about their secret formula) and what they are thinking about specific technologies. So MIX10 did offer focus, inspiration and people, but that would have meant nothing without valuable content. When I (being a frontend developer with a strong interest in UI/UX) planned my visit to MIX10, I made the decision to focus on the "soft" topics of design, interaction and user experience. I figured that I would be bombarded with all the technical details about Silverlight 4 anyway in the weeks and months to come. Actually, I would have liked to catch a few technical sessions, but the agenda wasn’t exactly in favor of people interested in any kind of Silverlight and UI/UX/Design topics. That’s one of my few complaints about the conference – I would have liked one more day and/or more sessions per day. Overall, the quality of the workshops and sessions was pretty high. In fact, looking back at my collection of conferences I’ve visited in the past I’d say that MIX10 ranks somewhere near the top spot. Here’s an overview of the workshops/sessions I attended (I’ll leave out the keynotes): Day 0 (Workshops on Sunday) Design Fundamentals for Developers Robby Ingebretsen is the man! Great workshop in three parts with the perfect mix of examples, well-structured definition of terminology and the right dose of humor. Robby was part of the WPF team before founding his own company so he not only has a strong interest in design (and the skillz!) but also the technical background. Design Tools and Techniques Originally announced to be held by Arturo Toledo, the Rosso brothers from ArcheType filled in for the first two parts, and Corrina Black had a pretty general part about the Windows Phone UI. The first two thirds were a mixed bag; the two guys definitely knew what they were talking about, and the demos were great, but the talk lacked the preparation and polish of a truly great presentation. Corrina was not allowed to go into too much detail before the keynote on Monday, but the session was still very interesting as it showed how much thought went into the Windows Phone UI (and there’s always a lot to learn when people talk about their thought process). Day 1 (Monday) Designing Rich Experiences for Data-Centric Applications I wonder whether there was ever a test-run for this session, but what Ken Azuma and Yoshihiro Saito delivered in the first 15 minutes of a 30-minutes-session made me walk out. A commercial for a product (just great: a video showing a SharePoint plug-in in an all-Japanese UI) combined with the most generic blah blah one could imagine. EPIC FAIL. Great User Experiences: Seamlessly Blending Technology & Design I switched to this session from the one above but I guess I missed the interesting part – what I did catch was what looked like a “look at the cool stuff we did” without being helpful. Or maybe I was just in a bad mood after the other session. The Art, Technology and Science of Reading This talk by Kevin Larson was very interesting, but was more a presentation of what Microsoft is doing in research (pretty impressive) and in the end lacked a bit the helpful advice one could have hoped for. 10 Ways to Attack a Design Problem and Come Out Winning Robby Ingebretsen again, and again a great mix of theory and practice. The clean and simple, yet effective, UI of the reader app resulted in a simultaneous “wow” of Jens and me. If you’d watch only one session video, this should be it. Microsoft has to bring Robby back next year! Day 2 (Tuesday) Touch in Public: Multi-touch Interaction Design for Kiosks & Architectural Experiences Very interesting session by Jason Brush, a great inspiration with many details to look out for in the examples. Exactly what I was hoping for – and then some! Designing Bing: Heart and Science How hard can it be to design the UI for a search engine? An input field and a list of results, that should be it, right? Well, not so fast! The talk by Paul Ray showed the many iterations to finally get it right (up to the choice of a specific blue for the links). And yes, I want an eye-tracking device to play around with! The Elephant in the Room When Nishant Kothary presented a long list of what his session was not about, I told to myself (not having the description text present) “Am I in the wrong talk? Should I leave?”. Boy, was I wrong. A great talk about human factors in the process of designing stuff. An Hour with Bill Buxton Having seen Bill Buxton’s presentation in the keynote, I just had to see this man again – even though I didn’t know what to expect. Being more or less unplanned and intended to be more of a conversation, the session didn’t provide a wealth of immediately useful information. Nevertheless Bill Buxton was impressive with his huge knowledge of seemingly everything. But this could/should have been a session some when in the evening and not in parallel to at least two other interesting talks. Day 3 (Wednesday) Design the Ordinary, Like the Fixie This session by DL Byron and Kevin Tamura started really well and brought across the message to keep things simple. But towards the end the talk lost some of its steam. And, as a member of the audience pointed out, they kind of ignored their own advice when they used a fancy presentation software other then PowerPoint that sometimes got in the way of showing things. Developing Natural User Interfaces Speaking of alternative presentation software, Joshua Blake definitely had the most remarkable alternative to PowerPoint, a self-written program called NaturalShow that was controlled using multi-touch on a touch screen. Not a PowerPoint-killer, but impressive nevertheless. The (excellent) talk itself was kind of eye-opening in regard to what “multi-touch support” on various platforms (WPF, Silverlight, Windows Phone) actually means. Treat your Content Right The talk by Tiffani Jones Brown wasn’t even on my planned schedule, but somehow I ended up in that session – and it was great. And even for people who don’t necessarily have to write content for websites, some points made by Tiffani are valid in many places, notably wherever you put texts with more than a single word into your UI. Creating Effective Info Viz in Microsoft Silverlight The last session of MIX10 I attended was kind of disappointing. At first things were very promising, with Matthias Shapiro giving a brief but well-structured introduction to info graphics and interactive visualizations. Then the live-coding began and while the result was interesting, too much time was spend on wrestling to get the code working. Ending earlier than planned, the talk was a bit light on actual content, but at least it included a nice list of resources. Conclusion It could be felt all across MIX10, UIs will take a huge leap forward; in fact, there are enough examples that have already. People who both have the technical know-how and at least a basic understanding of design (“literacy” as Bill Buxton called it) are in high demand. The concept of the MIX conference and initiatives like design.toolbox shows that Microsoft understands very well that frontend developers have to acquire new knowledge besides knowing how to hack code and putting buttons on a form. There are extremely exciting times before us, with lots of opportunity for those who are eager to develop their skills, that is for sure.

Read the article

What strategy do you use for package naming in Java projects and why?

- by Tim Visher

I thought about this awhile ago and it recently resurfaced as my shop is doing its first real Java web app. As an intro, I see two main package naming strategies. (To be clear, I'm not referring to the whole 'domain.company.project' part of this, I'm talking about the package convention beneath that.) Anyway, the package naming conventions that I see are as follows: Functional: Naming your packages according to their function architecturally rather than their identity according to the business domain. Another term for this might be naming according to 'layer'. So, you'd have a *.ui package and a *.domain package and a *.orm package. Your packages are horizontal slices rather than vertical. This is much more common than logical naming. In fact, I don't believe I've ever seen or heard of a project that does this. This of course makes me leery (sort of like thinking that you've come up with a solution to an NP problem) as I'm not terribly smart and I assume everyone must have great reasons for doing it the way they do. On the other hand, I'm not opposed to people just missing the elephant in the room and I've never heard a an actual argument for doing package naming this way. It just seems to be the de facto standard. Logical: Naming your packages according to their business domain identity and putting every class that has to do with that vertical slice of functionality into that package. I have never seen or heard of this, as I mentioned before, but it makes a ton of sense to me. I tend to approach systems vertically rather than horizontally. I want to go in and develop the Order Processing system, not the data access layer. Obviously, there's a good chance that I'll touch the data access layer in the development of that system, but the point is that I don't think of it that way. What this means, of course, is that when I receive a change order or want to implement some new feature, it'd be nice to not have to go fishing around in a bunch of packages in order to find all the related classes. Instead, I just look in the X package because what I'm doing has to do with X. From a development standpoint, I see it as a major win to have your packages document your business domain rather than your architecture. I feel like the domain is almost always the part of the system that's harder to grok where as the system's architecture, especially at this point, is almost becoming mundane in its implementation. The fact that I can come to a system with this type of naming convention and instantly from the naming of the packages know that it deals with orders, customers, enterprises, products, etc. seems pretty darn handy. It seems like this would allow you to take much better advantage of Java's access modifiers. This allows you to much more cleanly define interfaces into subsystems rather than into layers of the system. So if you have an orders subsystem that you want to be transparently persistent, you could in theory just never let anything else know that it's persistent by not having to create public interfaces to its persistence classes in the dao layer and instead packaging the dao class in with only the classes it deals with. Obviously, if you wanted to expose this functionality, you could provide an interface for it or make it public. It just seems like you lose a lot of this by having a vertical slice of your system's features split across multiple packages. I suppose one disadvantage that I can see is that it does make ripping out layers a little bit more difficult. Instead of just deleting or renaming a package and then dropping a new one in place with an alternate technology, you have to go in and change all of the classes in all of the packages. However, I don't see this is a big deal. It may be from a lack of experience, but I have to imagine that the amount of times you swap out technologies pales in comparison to the amount of times you go in and edit vertical feature slices within your system. So I guess the question then would go out to you, how do you name your packages and why? Please understand that I don't necessarily think that I've stumbled onto the golden goose or something here. I'm pretty new to all this with mostly academic experience. However, I can't spot the holes in my reasoning so I'm hoping you all can so that I can move on. Thanks in advance!

Developer IT