Search Results

Search found 36 results on 2 pages for 'cloudera'.

Page 1/2 | 1 2  | Next Page >

  • Cloudera Manager CDH5 - Installation Failure on Oozie Database

    - by Nerrve
    While doing the installation, i keep getting a failure on the step "Creating Oozie database" java.lang.Exception: DB schema exists at org.apache.oozie.tools.OozieDBCLI.validateDBSchema(OozieDBCLI.java:877) at org.apache.oozie.tools.OozieDBCLI.createDB(OozieDBCLI.java:184) at org.apache.oozie.tools.OozieDBCLI.run(OozieDBCLI.java:127) at org.apache.oozie.tools.OozieDBCLI.main(OozieDBCLI.java:78) How do i fix this? Where do i get the password/username/dbname for the PostgreSQL database to drop the existing schema? I tried cat /etc/cloudera-scm-server/db*.properties | grep pass and /var/lib/cloudera-scm-server-db/data/generated-password.txt but the passwords don't work!

    Read the article

  • Cloudera Hadoop Certification Value in IT Industry for freshers

    - by Saumitra
    I am a software developer with 8 months of experience in IT industry working on development of tools for BIG DATA analytics. I have learned Hadoop basics on my own and I am pretty comfortable with writing MapReduce Jobs, PIG, HIVE, Flume and other related projects. I am thinking of appearing for Cloudera Hadoop Certification. My question is whether it will benefit me in any way, considering that I am a fresher with not even 1 year of experience. Most of the jobs posting which I have seen related to Hadoop requires at least 3 years of experience. I currently work in India but I can relocate. Please help me in deciding whether I should invest my time in perfecting my Hadoop skills for certification?

    Read the article

  • Typical Hadoop setup for remote job submission

    - by Artii
    So I am still a bit new to hadoop and am currently in the process of setting up a small test cluster on Amazonaws. So my question relates to some tips on the structuring of the cluster so it is possible to work submit jobs from remote machines. Currently I have 5 machines. 4 are basically the Hadoop cluster with the NameNodes, Yarn etc. One machine is used as a manager machine( Cloudera Manager). I am gonna describe my thinking process on the setup and if anyone can chime in the points I am not clear with, that would be great. I was thinking what was the best setup for a small cluster. So I decided to expose only one manager machine and probably use that to submit all the jobs through it. The other machines will see each other etc, but not be accessible from the outside world. I am have conceptual idea on how to do this,but I am not sure how to properly go about doing this though, if anyone could point me in the right direction that would great. Also another big point is, I want to be able to submit jobs to the cluster through exposed machine from a client machine (might be Windows). I am not so clear on this setup as well. Do I need to have Hadoop installed on the machine in order to use the normal hadoop commands, and to write/submit jobs say from Eclipse or something similar. So to sum it up my questions are, Is this an ok setup for a small test cluster How can I go about using one exposed machine to submit/route jobs to the cluster, without having any of the Hadoop nodes on it. How do I setup a client machine to submit jobs to a remote cluster, and an example on how to do it on Windows. Also if there are any reason not to use Windows as a client machine in this setup. Thanks I would greatly appreciate any advice or help on this.

    Read the article

  • Make cloudera-vm work on Oracle VM VirtualBox

    - by ????? ????????
    I downloaded this and the instructions say: Important: You must enable the I/O APIC in order to use 64-bit mode. (See http://www.virtualbox.org/manual/ch03.html.) On newer versions of VirtualBox, it may default to using SATA as the disk interface. This can cause a kernel panic in the VM. Switching to the IDE driver solves this problem. I am running this on Red Hat 64-bit mode (I've also tried on Ubuntu 64-bit with the same result). I pointed to the cloudera-vm image as a startup disk for the VM. I am getting this message: Failed to open a session for the virtual machine ClouderaDevelopment. VT-x features locked or unavailable in MSR. (VERR_VMX_MSR_LOCKED_OR_DISABLED). Result Code: E_FAIL (0x80004005) Component: Console Interface: IConsole {1968b7d3-e3bf-4ceb-99e0-cb7c913317bb} Does anyone know what I am doing wrong?

    Read the article

  • Configuring correct port for Oozie (invoking PIG script) in Cloudera Hue

    - by user2985324
    I am new to CDH4 Oozie workflow editor. While trying to invoke a pig script from Oozie workflow editor, i am getting the following error. HadoopAccessorException: E0900: Jobtracker [mymachine:8032] not allowed, not in Oozies whitelist It looks like Oozie is submitting the job to Yarn port (8032). I want it to submit to 8021 (MR jobtracker) port. Can someone help me in identify where to set the job tracker URL or port so that oozie picks up the correct one (using Hue or Cloudera manager). Previously I tried the following but none of them helped Modfied workflow.xml file /user/hue/oozie/workspaces/../workflow.xml file. However it gets overwritten when I submit the job from workflow editor. In cloudera Manager -- oozie -- configuration --Oozie Server (advanced) -- Oozie Server Configuration Safety Valve for oozie-site.xml property I set the following- <property> <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name> <value>mymachine:8020</value> oozie.service.HadoopAccessorService.jobTracker.whitelist mymachine:8021 and restarted the oozie service. 3. Tried to override 'jobTracker' property while configuring the pig task. This appears as follows in the workflow file however it doesn't take effect (or doesn't override) and still uses 8032 port. <global> <configuration> <property> <name>jobTracker</name> <value>mymachine:8021</value> </property> </configuration> </global> I am using CDH4 version. Thanks for looking into my question.

    Read the article

  • can i use hadoop cloudera without root access?

    - by in_the_cloud
    a bit of a binary question (okay, not excatly) - but was wondering if one is able to configure cloudera / hadoop to run at the nodes without root shell access to the node computers (although i can setup ssh passwordless login)? appears from their instructions that root access is needed, at yet i found a hadoop wiki which suggest root access might not be needed ? http://wiki.apache.org/nutch/NutchHadoopTutorial

    Read the article

  • Cloudera Manager agent deploy failing to receive heartbeat from agent

    - by user150341
    All, I am getting the error on the console at the last phase of the installation: Installation failed. Failed to receive heartbeat from agent Server Log: 2012-12-19 00:32:12,132 INFO [NodeConfiguratorThread-4- 0:node.NodeConfiguratorProgress@503] 192.168.1.100: Setting WAIT_FOR_HEARTBEAT as failed and done state All nodes (name node and (2)client nodes) are VM's running 64bit CentOS. sshd has been enabled on all nodes, and VM's are set to Bridge. Any clue on how to fix this error?

    Read the article

  • What is the value of the Cloudera Hadoop Certification for people new to the IT industry?

    - by Saumitra
    I am a software developer with 8 months of experience in the IT industry, currently working on the development of tools for BIG DATA analytics. I have learned Hadoop basics on my own and I am pretty comfortable with writing MapReduce Jobs, PIG, HIVE, Flume and other related projects. I am thinking of taking the exam for the Cloudera Hadoop Certification. Will this certification add value, considering that I have less than 1 year of experience? Many of the jobs I've seen relating to Hadoop require at least 3 years of experience. Should I invest more time in learning Hadoop and improving my skills to take this certification?

    Read the article

  • Using a "local" S3 emulation layer as a replacement for HDFS?

    - by user183394
    I have been testing out the most recent Cloudera CDH4 hadoop-conf-pseudo (i.e. MRv2 or YARN) on a notebook, which has 4 cores, 8GB RAM, an Intel X25MG2 SSD, and runs a S3 emulation layer my colleagues and I wrote in C++. The OS is Ubuntu 12.04LTS 64bit. So far so good. Looking at Setting up hadoop to use S3 as a replacement for HDFS, I would like to do it on my notebook. Nevertheless, I can't find where I can change the jets3t.properties for setting the end point to localhost. I downloaded the hadoop-2.0.1-alpha.tar.gz and searched the source without finding out a clue. There is a similar Q on SO Using s3 as fs.default.name or HDFS?, but I want to use our own lightweight and fast S3 emulation layer, instead of AWS S3, for our experiments. I would appreciate a hint as to how I can change the end point to a different hostname. Regards, --Zack

    Read the article

  • Not able to connect to port different than 22 - OpenVPN

    - by t8h7gu
    I have OpenVPN network with 5 clients. Computer with Arch Linux which hosts OpenVPN server, It also hosts virtual machine with Computer with CentOS which is also connnected to OpenVPN subnet. Windows 8 which hosts virtual machine with CentOS. Both of them are connected to OpenVPN. Last one machine is virtual machine with CentOS which is hosted by computer with Ubuntu 14( which is not connected to OpenVPN. All machines in OpenVPN subnet are bolded. All phisical computers are in different networks. The problem is that when I use nmap to scan Windows and it's guest virtual machine it's saids that host seems down. When I force namp to scan specific port it shows filtered state: nmap -Pn -p 50010 n3 Starting Nmap 6.46 ( http://nmap.org ) at 2014-06-07 19:49 CEST Nmap scan report for n3 (10.8.0.3) Host is up (0.11s latency). rDNS record for 10.8.0.3: node3.com PORT STATE SERVICE 50010/tcp filtered unknown Telnet also cannot connect to this port telnet n3 50010 Trying 10.8.0.3... telnet: Unable to connect to remote host: No route to host But ss on this host show's proper state of this port ss -anp | grep 50010 LISTEN 0 50 10.8.0.3:50010 *:* users:(("java",12310,271)) What might be possible reason of that and how to fix it? EDIT I've found that I am able to connect via telnet to ssh port: telnet n3 22 Trying 10.8.0.3... Connected to n3. Escape character is '^]'. SSH-2.0-OpenSSH_5.3 So it seems that it's not problem with Windows firewall. But I have no idea what it might be. Also nmap result for first thousand ports: nmap -Pn -p 1-1000 n3 Starting Nmap 6.46 ( http://nmap.org ) at 2014-06-07 20:08 CEST Nmap scan report for n3 (10.8.0.3) Host is up (0.49s latency). rDNS record for 10.8.0.3: node3.com Not shown: 999 filtered ports PORT STATE SERVICE 22/tcp open ssh Nmap done: 1 IP address (1 host up) scanned in 77.87 seconds

    Read the article

  • Big Data Appliance X4-2 Release Announcement

    - by Jean-Pierre Dijcks
    Today we are announcing the release of the 3rd generation Big Data Appliance. Read the Press Release here. Software Focus The focus for this 3rd generation of Big Data Appliance is: Comprehensive and Open - Big Data Appliance now includes all Cloudera Software, including Back-up and Disaster Recovery (BDR), Search, Impala, Navigator as well as the previously included components (like CDH, HBase and Cloudera Manager) and Oracle NoSQL Database (CE or EE). Lower TCO then DIY Hadoop Systems Simplified Operations while providing an open platform for the organization Comprehensive security including the new Audit Vault and Database Firewall software, Apache Sentry and Kerberos configured out-of-the-box Hardware Update A good place to start is to quickly review the hardware differences (no price changes!). On a per node basis the following is a comparison between old and new (X3-2) hardware: Big Data Appliance X3-2 Big Data Appliance X4-2 CPU 2 x 8-Core Intel® Xeon® E5-2660 (2.2 GHz) 2 x 8-Core Intel® Xeon® E5-2650 V2 (2.6 GHz) Memory 64GB 64GB Disk 12 x 3TB High Capacity SAS 12 x 4TB High Capacity SAS InfiniBand 40Gb/sec 40Gb/sec Ethernet 10Gb/sec 10Gb/sec For all the details on the environmentals and other useful information, review the data sheet for Big Data Appliance X4-2. The larger disks give BDA X4-2 33% more capacity over the previous generation while adding faster CPUs. Memory for BDA is expandable to 512 GB per node and can be done on a per-node basis, for example for NameNodes or for HBase region servers, or for NoSQL Database nodes. Software Details More details in terms of software and the current versions (note BDA follows a three monthly update cycle for Cloudera and other software): Big Data Appliance 2.2 Software Stack Big Data Appliance 2.3 Software Stack Linux Oracle Linux 5.8 with UEK 1 Oracle Linux 6.4 with UEK 2 JDK JDK 6 JDK 7 Cloudera CDH CDH 4.3 CDH 4.4 Cloudera Manager CM 4.6 CM 4.7 And like we said at the beginning it is important to understand that all other Cloudera components are now included in the price of Oracle Big Data Appliance. They are fully supported by Oracle and available for all BDA customers. For more information: Big Data Appliance Data Sheet Big Data Connectors Data Sheet Oracle NoSQL Database Data Sheet (CE | EE) Oracle Advanced Analytics Data Sheet

    Read the article

  • Example map-reduce oozie program not working on CDH 4.5

    - by user2002748
    I am using Hadoop (CDH 4.5) on my mac since some time now, and do run map reduce jobs regularly. I installed oozie recently (again, CDH4.5) following instructions at: http://archive.cloudera.com/cdh4/cdh/4/oozie-3.3.2-cdh4.5.0/DG_QuickStart.html, and tried to run sample programs provided. However, it always fails with the following error. Looks like the workflow is not getting run at all. The Console URL field in the Job info is also empty. Could someone please help on this? The relevant snippet of the Oozie Job log follows. 2014-06-10 17:27:18,414 INFO ActionStartXCommand:539 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@:start:] Start action [0000000-140610172702069-oozie-usrX-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-06-10 17:27:18,417 WARN ActionStartXCommand:542 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@:start:] [***0000000-140610172702069-oozie-usrX-W@:start:***]Action status=DONE 2014-06-10 17:27:18,417 WARN ActionStartXCommand:542 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@:start:] [***0000000-140610172702069-oozie-usrX-W@:start:***]Action updated in DB! 2014-06-10 17:27:18,576 INFO ActionStartXCommand:539 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@mr-node] Start action [0000000-140610172702069-oozie-usrX-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-06-10 17:27:19,188 WARN MapReduceActionExecutor:542 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@mr-node] credentials is null for the action 2014-06-10 17:27:19,423 WARN ActionStartXCommand:542 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@mr-node] Error starting action [mr-node]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Unknown rpc kind RPC_WRITABLE] org.apache.oozie.action.ActionExecutorException: JA009: Unknown rpc kind RPC_WRITABLE at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:418) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:773) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:927) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59) at org.apache.oozie.command.XCommand.call(XCommand.java:277) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE at org.apache.hadoop.ipc.Client.call(Client.java:1238) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225) at org.apache.hadoop.mapred.$Proxy30.getDelegationToken(Unknown Source) at org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:2125) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:372) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:970) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:723) ... 10 more 2014-06-10 17:27:19,426 INFO ActionStartXCommand:539 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@mr-node] Next Retry, Attempt Number [1] in [60,000] milliseconds 2014-06-10 17:28:19,468 INFO ActionStartXCommand:539 - USER[userXXX] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-140610172702069-oozie-usrX-W] ACTION[0000000-140610172702069-oozie-usrX-W@mr-node] Start action [0000000-140610172702069-oozie-usrX-W@mr-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]

    Read the article

  • Any good method for mounting Hadoop HDFS from another system?

    - by Beel
    I want to mount the Cloudera Hadoop as a Linux file system over the LAN. As a setup, I already have the hadoop cluster running on a set of Ubuntu machines. But now I need to be able to use it as a normal file system from a Fedora system over the LAN. I tried FUSe but two things: 1. Cloudera says FUSE loses data (click here for that comment by a Cloudera employee on the official Cloudera support site) 2. I've had no success making it work the way we want As a point of clarification, I am using Hadoop ONLY for the file system, not for its other capabilities.

    Read the article

  • Which Hadoop API version should I use?

    - by Niels Basjes
    In the latest Hadoop Studio the 0.18 API of Hadoop is called "Stable" and the 0.20 API of Hadoop is called "Unstable". The distribution that comes from Yahoo is a 0.20 (with yahoo patches), which is apparently "the way to go". From cloudera they state the 0.20 (with cloudera patches) is also stable. Now given the fact that we'll start coding a new Hadoop project in the next few weeks; which API should we use and which Hadoop distribution (Apache, Cloudera, Yahoo, ...) should we use? Thanks for your insights.

    Read the article

  • Oracle Announces Oracle Big Data Appliance X3-2 and Enhanced Oracle Big Data Connectors

    - by jgelhaus
    Enables Customers to Easily Harness the Business Value of Big Data at Lower Cost Engineered System Simplifies Big Data for the Enterprise Oracle Big Data Appliance X3-2 hardware features the latest 8-core Intel® Xeon E5-2600 series of processors, and compared with previous generation, the 18 compute and storage servers with 648 TB raw storage now offer: 33 percent more processing power with 288 CPU cores; 33 percent more memory per node with 1.1 TB of main memory; and up to a 30 percent reduction in power and cooling Oracle Big Data Appliance X3-2 further simplifies implementation and management of big data by integrating all the hardware and software required to acquire, organize and analyze big data. It includes: Support for CDH4.1 including software upgrades developed collaboratively with Cloudera to simplify NameNode High Availability in Hadoop, eliminating the single point of failure in a Hadoop cluster; Oracle NoSQL Database Community Edition 2.0, the latest version that brings better Hadoop integration, elastic scaling and new APIs, including JSON and C support; The Oracle Enterprise Manager plug-in for Big Data Appliance that complements Cloudera Manager to enable users to more easily manage a Hadoop cluster; Updated distributions of Oracle Linux and Oracle Java Development Kit; An updated distribution of open source R, optimized to work with high performance multi-threaded math libraries Read More   Data sheet: Oracle Big Data Appliance X3-2 Oracle Big Data Appliance: Datacenter Network Integration Big Data and Natural Language: Extracting Insight From Text Thomson Reuters Discusses Oracle's Big Data Platform Connectors Integrate Hadoop with Oracle Big Data Ecosystem Oracle Big Data Connectors is a suite of software built by Oracle to integrate Apache Hadoop with Oracle Database, Oracle Data Integrator, and Oracle R Distribution. Enhancements to Oracle Big Data Connectors extend these data integration capabilities. With updates to every connector, this release includes: Oracle SQL Connector for Hadoop Distributed File System, for high performance SQL queries on Hadoop data from Oracle Database, enhanced with increased automation and querying of Hive tables and now supported within the Oracle Data Integrator Application Adapter for Hadoop; Transparent access to the Hive Query language from R and introduction of new analytic techniques executing natively in Hadoop, enabling R developers to be more productive by increasing access to Hadoop in the R environment. Read More Data sheet: Oracle Big Data Connectors High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

    Read the article

  • Amazon EC2 master node hanging

    - by Algorist
    Hi, I am using cloudera setup to launch a cluster with hadoop on Amazon. Sometimes, the master hadoop node hangs and we have to restart the job from the job. Did anyone face similar problem and resolve the issue. Thank you.

    Read the article

  • Hadoop hdfs namenode is throwing an error

    - by KarmicDice
    Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.0.1 STARTUP_MSG: classpath = /etc/hadoop/conf:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/commons-logging-api-1.1.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/json-simple-1.1.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.0.1.jar:/usr/lib/hadoop/lib/avro-1.5.4.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.0.1-tests.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.15.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.0.1.jar:/usr/lib/hadoop-hdfs/lib/avro-1.5.4.jar:/usr/lib/hadoop-hdfs/lib/paranamer-2.3.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.0.1-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.3.Final.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.15.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jdiff-1.0.9.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.4.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-mapreduce/.//* STARTUP_MSG: build = file:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.0.1-Packaging-Hadoop-2012-06-28_17-01-57/hadoop-2.0.0+91-1.cdh4.0.1.p0.1~precise/src/hadoop-common-project/hadoop-common -r 4d98eb718ec0cce78a00f292928c5ab6e1b84695; compiled by 'jenkins' on Thu Jun 28 17:39:19 PDT 2012 ************************************************************/ 12/09/10 14:41:10 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties hdfs-site.xml: hb@localhost:/etc/hadoop/conf$ cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera CM on 2012-09-03T10:13:30.628Z--> <configuration> <property> <name>dfs.https.address</name> <value>localhost:50470</value> </property> <property> <name>dfs.https.port</name> <value>50470</value> </property> <property> <name>dfs.namenode.http-address</name> <value>localhost:50070</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name>dfs.client.use.datanode.hostname</name> <value>false</value> </property> </configuration>

    Read the article

  • Big Data Appliance

    - by David Dorf
    Today Oracle announced the next release of it's Big Data Appliance, an engineered system composed of hardware and software targeting the efficient processing of big data.  The solution leverages 288 Intel cores running Cloudera's distribution of Apache Hadoop in 1.1 TB of main memory.  This monster helps companies acquire, organize, and analyze large volumes of structured and un-structured data. Additionally a new versions of the Oracle Big Data Connectors and Oracle NoSQL Database were released. Why is this important to retailers?  As the infographic below conveys, mobile and social have added even more data to the already huge collections of POS transactions and e-commerce weblogs.  Retailers know that mining that data will help them make better decisions that lead to increased sales, better customer service, and ultimately a successful retail business. Monetate

    Read the article

  • Where can I find the supported way to deploy hadoop on precise?

    - by Jeff McCarrell
    I want to set up a small (6 node) hadoop/hive/pig cluster. I see the work in the juju space on charms; however, the current status of deploying a single charm per node will not work for me. I see ServerTeam Hadoop which talks about re-packaging the bigtop packages. The cloudera CDH3 installation guide talks about Maverick and Lucid, but not precise. What am I missing? Is there a straight forward way to deploy hadoop/hive/pig on 6 nodes that does not involve building from tarballs?

    Read the article

  • Winner of the 2012 Government Big Data Solutions Award

    - by Jean-Pierre Dijcks
    Hot off the press: The winner of the 2012 Government Big Data Solutions Aware is the National Cancer Institute!! Read all the details on CTOLabs.com. A short excerpt to wet your appetite: "... This solution, based on the Oracle Big Data Appliance with the Cloudera Distribution of Apache Hadoop (CDH), leverages capabilities available from the Big Data community today in pioneering ways that can serve a broad range of researchers. The promising approach of this solution is repeatable across many other Big Data challenges for bioinfomatics, making this approach worthy of its selection as the 2012 Government Big Data Solution Award." Read the entire post. Congrats to the entire team!!

    Read the article

  • Amazon Web Services : Fault tolerant solution

    - by Algorist
    Hi, I am using Boto library to write scripts for automating our jobs on AWS. My script actually starts a hadoop cluster using cloudera scripts and then does some customization. I am having a problem with retries. Seems like very command in my script fails once couple of days. I started adding retry to all the commands, but then the code is very clumsy and difficult to maintain. what do people do in general. Thank you Bala

    Read the article

  • Which Hadoop API should I use?

    - by Niels Basjes
    In the latest Hadoop Studio the 0.18 API of Hadoop is called "Stable" and the 0.20 API of Hadoop is called "Unstable". Now given the fact that we'll start coding a new Hadoop project in the next few weeks; which API should we use and which Hadoop distribution (Apache, Cloudera, Yahoo, ...) should we use? Thanks for your insights.

    Read the article

  • Thursday Community Keynote: "By the Community, For the Community"

    - by Janice J. Heiss
    Sharat Chander, JavaOne Community Chairperson, began Thursday's Community Keynote. As part of the morning’s theme of "By the Community, For the Community," Chander noted that 60% of the material at the 2012 JavaOne conference was presented by Java Community members. "So next year, when the call for papers starts, put-in your submissions," he urged.From there, Gary Frost, Principal Member of Technical Staff, AMD, expanded upon Sunday's Strategy Keynote exploration of Project Sumatra, an OpenJDK project targeted at bringing Java to heterogeneous computing platforms (which combine the CPU and the parallel processor of the GPU into a single piece of silicon). Sumatra entails enhancing the JVM to make maximum use of these advanced platforms. Within this development space, AMD created the Aparapi API, which converts Java bytecode into OpenCL for execution on such GPU devices. The Aparapi API was open sourced in September 2011.Whether it was zooming-in on a Mandelbrot set, "the game of life," or a swarm of 10,000 Dukes in a space-bound gravitational dance, Frost's demos, using an Aparapi/OpenCL implementation, produced stunningly faster display results. He indicated that the Java 9 timeframe is where they see Project Sumatra coming to ultimate fruition, employing the Lamdas of Java 8.Returning to the theme of the keynote, Donald Smith, Director, Java Product Management, Oracle, explored a mind map graphic demonstrating the importance of Community in terms of fostering innovation. "It's the sharing and mixing of culture, the diversity, and the rapid prototyping," he said. Within this topic, Smith, brought up a panel of representatives from Cloudera, Eclipse, Eucalyptus, Perrone Robotics, and Twitter--ideal manifestations of community and innovation in the world of Java.Marten Mickos, CEO, Eucalyptus Systems, explored his company's open source cloud software platform, written in Java, and used by gaming companies, technology companies, media companies, and more. Chris Aniszczyk, Operations Engineering,Twitter, noted the importance of the JVM in terms of their multiple-language development environment. Mike Olson, CEO, Cloudera, described his company's Apache Hadoop-based software, support, and training. Mike Milinkovich, Executive Director, Eclipse Foundation, noted that they have about 270 tools projects at Eclipse, with 267 of them written in Java. Milinkovich added that Eclipse will even be going into space in 2013, as part of the control software on various experiments aboard the International Space Station. Lastly, Paul Perrone, CEO, Perrone Robotics, detailed his company's robotics and automation software platform built 100% on Java, including Java SE and Java ME--"on rat, to cat, to elephant-sized systems." Milinkovic noted that communities are by nature so good at innovation because of their very openness--"The more open you make your innovation process, the more ideas are challenged, and the more developers are focused on justifying their choices all the way through the process."From there, Georges Saab, VP Development Java SE OpenJDK, continued the topic of innovation and helping the Java Community to "Make the Future Java." Martijn Verburg, representing the London Java Community (winner of a Duke's Choice Award 2012 for their activity in OpenJDK and JCP), soon joined Saab onstage. Verburg detailed the LJC's "Adopt a JSR" program--"to get day-to-day developers more involved in the innovation that's happening around them."  From its London launching pad, the innovative program has spread to Brazil, Morocco, Latvia, India, and more.Other active participants in the program joined Verburg onstage--Ben Evans, London Java Community; James Gough, Stackthread; Bruno Souza, SOUJava; Richard Warburton, jClarity; and Cecelia Borg, Oracle--OpenJDK Onboarding. Together, the group explored the goals and tasks inherent in the Adopt a JSR program--from organizing hack days (testing prototype implementations), to managing mailing lists and forums, to triaging issues, to evangelism—all with the goal of fostering greater community/developer involvement, but equally importantly, building better open standards. “Come join us, and make your ecosystem better!" urged Verburg.Paul Perrone returned to profile the latest in his company's robotics work around Java--including the AARDBOTS family of smaller robotic vehicles, running the Perrone MAX platform on top of the Java JVM. Perrone took his "Rumbles" four-wheeled robot out for a spin onstage--a roaming, ARM-based security-bot vehicle, complete with IR, ultrasonic, and "cliff" sensors (the latter, for the raised stage at JavaOne). As an ultimate window into the future of robotics, Perrone displayed a "head-set" controller--a sensor directed at the forehead to monitor brainwaves, for the someday-implementation of brain-to-robot control.Then, just when it seemed this might be the end of the day's futuristic offerings, a mystery voice from offstage pronounced "I've got some toys"--proving to be guest-visitor James Gosling, there to explore his cutting-edge work with Liquid Robotics. While most think of robots as something with wheels or arms or lasers, Gosling explained, the Liquid Robotics vehicle is an entirely new and innovative ocean-going 'bot. Looking like a floating surfboard, with an attached set of underwater wings, the autonomous devices roam the oceans using only the energy of ocean waves to propel them, and a single actuated rudder to steer. "We have to accomplish all guidance just by wiggling the rudder," Gosling said. The devices offer applications from self-installing weather buoy, to pollution monitoring station, to marine mammal monitoring device, to climate change data gathering, to even ocean life genomic sampling. The early versions of the vehicle used C code on very tiny industrial micro controllers, where they had to "count the bytes one at a time."  But the latest generation vehicles, which just hit the water a week or so ago, employ an ARM processor running Linux and the ARM version of JDK 7. Gosling explained that vehicle communication from remote locations is achieved via the Iridium satellite network. But because of the costs of this communication path, the data must be sent in very small bursts--using SBD short burst data. "It costs $1/kb, so that rules everything in the software design,” said Gosling. “If you were trying to stream a Netflix video over this, it would cost a million dollars a movie. …We don't have a 'big data' problem," he quipped. There are currently about 150 Liquid Robotics vehicles out traversing the oceans. Gosling demonstrated real time satellite tracking of several vehicles currently at sea, noting that Java is actually particularly good at AI applications--due to the language having garbage collection, which facilitates complex data structures. To close-out his time onstage, Gosling of course participated in the ceremonial Java tee-shirt toss out to the audience…In parting, Chander passed the JavaOne Community Chairperson baton to Stephen Chin, Java Technology Evangelist, Oracle. Onstage in full motorcycle gear, Chin noted that he'll soon be touring Europe by motorcycle, meeting Java Community Members and streaming live via UStream--the ultimate manifestation of community and technology!  He also reminded attendees of the upcoming JavaOne Latin America 2012, São Paulo, Brazil (December 4-6, 2012), and stated that the CFP (call for papers) at the conference has been extended for one more week. "Remember, December is summer in Brazil!" Chin said.

    Read the article

  • EVENT RECAP: Oracle Day & Product Fair - Atlanta

    - by cwarticki
    Are you attending any of the Oracle Days and other Events? They are fantastic!  Keep track of the Oracle Events by following @OracleEvents on Twitter.  Also, stay in the know by subscribing to one of the several Oracle Newsletters. Those will also keep you posted of upcoming in-person and webcast events. From the Oracle Events website, simply navigate to your geography and refine your options to locate what interests you. You can also perform keyword searches. Yesterday, I had the opportunity to participate in the Oracle Day & Product Fair in Atlanta, Georgia.  Thanks to those who stopped by to ask your support questions and watch me demo My Oracle Support features and best practices. It was a fantastic turnout.  The Buckhead Theatre provided served as an excellent venue. It was standing room only for the double keynotes on topics of interest to our customers: Navigating Complexity by Simplifying I.T., and Oracle Exadata X3-Transforming Data Management. The Product Fair was staffed by many Oracle professionals and our Partners too.  It was great meeting new people like the team representing OAUG.  Many thanks to our sponsors: BIAS, Cloudera, Intel and TekStream Solutions.Come attend one of the many Oracle Days & other events planned for you. I'll be attending the one Ft. Lauderdale, FL on November 16th. See you there. -Chris WartickiGlobal Customer Management

    Read the article

  • New Big Data Appliance Security Features

    - by mgubar
    The Oracle Big Data Appliance (BDA) is an engineered system for big data processing.  It greatly simplifies the deployment of an optimized Hadoop Cluster – whether that cluster is used for batch or real-time processing.  The vast majority of BDA customers are integrating the appliance with their Oracle Databases and they have certain expectations – especially around security.  Oracle Database customers have benefited from a rich set of security features:  encryption, redaction, data masking, database firewall, label based access control – and much, much more.  They want similar capabilities with their Hadoop cluster.    Unfortunately, Hadoop wasn’t developed with security in mind.  By default, a Hadoop cluster is insecure – the antithesis of an Oracle Database.  Some critical security features have been implemented – but even those capabilities are arduous to setup and configure.  Oracle believes that a key element of an optimized appliance is that its data should be secure.  Therefore, by default the BDA delivers the “AAA of security”: authentication, authorization and auditing. Security Starts at Authentication A successful security strategy is predicated on strong authentication – for both users and software services.  Consider the default configuration for a newly installed Oracle Database; it’s been a long time since you had a legitimate chance at accessing the database using the credentials “system/manager” or “scott/tiger”.  The default Oracle Database policy is to lock accounts thereby restricting access; administrators must consciously grant access to users. Default Authentication in Hadoop By default, a Hadoop cluster fails the authentication test. For example, it is easy for a malicious user to masquerade as any other user on the system.  Consider the following scenario that illustrates how a user can access any data on a Hadoop cluster by masquerading as a more privileged user.  In our scenario, the Hadoop cluster contains sensitive salary information in the file /user/hrdata/salaries.txt.  When logged in as the hr user, you can see the following files.  Notice, we’re using the Hadoop command line utilities for accessing the data: $ hadoop fs -ls /user/hrdataFound 1 items-rw-r--r--   1 oracle supergroup         70 2013-10-31 10:38 /user/hrdata/salaries.txt$ hadoop fs -cat /user/hrdata/salaries.txtTom Brady,11000000Tom Hanks,5000000Bob Smith,250000Oprah,300000000 User DrEvil has access to the cluster – and can see that there is an interesting folder called “hrdata”.  $ hadoop fs -ls /user Found 1 items drwx------   - hr supergroup          0 2013-10-31 10:38 /user/hrdata However, DrEvil cannot view the contents of the folder due to lack of access privileges: $ hadoop fs -ls /user/hrdata ls: Permission denied: user=drevil, access=READ_EXECUTE, inode="/user/hrdata":oracle:supergroup:drwx------ Accessing this data will not be a problem for DrEvil. He knows that the hr user owns the data by looking at the folder’s ACLs. To overcome this challenge, he will simply masquerade as the hr user. On his local machine, he adds the hr user, assigns that user a password, and then accesses the data on the Hadoop cluster: $ sudo useradd hr $ sudo passwd $ su hr $ hadoop fs -cat /user/hrdata/salaries.txt Tom Brady,11000000 Tom Hanks,5000000 Bob Smith,250000 Oprah,300000000 Hadoop has not authenticated the user; it trusts that the identity that has been presented is indeed the hr user. Therefore, sensitive data has been easily compromised. Clearly, the default security policy is inappropriate and dangerous to many organizations storing critical data in HDFS. Big Data Appliance Provides Secure Authentication The BDA provides secure authentication to the Hadoop cluster by default – preventing the type of masquerading described above. It accomplishes this thru Kerberos integration. Figure 1: Kerberos Integration The Key Distribution Center (KDC) is a server that has two components: an authentication server and a ticket granting service. The authentication server validates the identity of the user and service. Once authenticated, a client must request a ticket from the ticket granting service – allowing it to access the BDA’s NameNode, JobTracker, etc. At installation, you simply point the BDA to an external KDC or automatically install a highly available KDC on the BDA itself. Kerberos will then provide strong authentication for not just the end user – but also for important Hadoop services running on the appliance. You can now guarantee that users are who they claim to be – and rogue services (like fake data nodes) are not added to the system. It is common for organizations to want to leverage existing LDAP servers for common user and group management. Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository. This simplifies the deployment and administration of the secure environment. Authorize Access to Sensitive Data Kerberos-based authentication ensures secure access to the system and the establishment of a trusted identity – a prerequisite for any authorization scheme. Once this identity is established, you need to authorize access to the data. HDFS will authorize access to files using ACLs with the authorization specification applied using classic Linux-style commands like chmod and chown (e.g. hadoop fs -chown oracle:oracle /user/hrdata changes the ownership of the /user/hrdata folder to oracle). Authorization is applied at the user or group level – utilizing group membership found in the Linux environment (i.e. /etc/group) or in the LDAP server. For SQL-based data stores – like Hive and Impala – finer grained access control is required. Access to databases, tables, columns, etc. must be controlled. And, you want to leverage roles to facilitate administration. Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies the following three authorization requirements: Secure Authorization:  the ability to control access to data and/or privileges on data for authenticated users. Fine-Grained Authorization:  the ability to give users access to a subset of the data (e.g. column) in a database Role-Based Authorization:  the ability to create/apply template-based privileges based on functional roles. With Sentry, “all”, “select” or “insert” privileges are granted to an object. The descendants of that object automatically inherit that privilege. A collection of privileges across many objects may be aggregated into a role – and users/groups are then assigned that role. This leads to simplified administration of security across the system. Figure 2: Object Hierarchy – granting a privilege on the database object will be inherited by its tables and views. Sentry is currently used by both Hive and Impala – but it is a framework that other data sources can leverage when offering fine-grained authorization. For example, one can expect Sentry to deliver authorization capabilities to Cloudera Search in the near future. Audit Hadoop Cluster Activity Auditing is a critical component to a secure system and is oftentimes required for SOX, PCI and other regulations. The BDA integrates with Oracle Audit Vault and Database Firewall – tracking different types of activity taking place on the cluster: Figure 3: Monitored Hadoop services. At the lowest level, every operation that accesses data in HDFS is captured. The HDFS audit log identifies the user who accessed the file, the time that file was accessed, the type of access (read, write, delete, list, etc.) and whether or not that file access was successful. The other auditing features include: MapReduce:  correlate the MapReduce job that accessed the file Oozie:  describes who ran what as part of a workflow Hive:  captures changes were made to the Hive metadata The audit data is captured in the Audit Vault Server – which integrates audit activity from a variety of sources, adding databases (Oracle, DB2, SQL Server) and operating systems to activity from the BDA. Figure 4: Consolidated audit data across the enterprise.  Once the data is in the Audit Vault server, you can leverage a rich set of prebuilt and custom reports to monitor all the activity in the enterprise. In addition, alerts may be defined to trigger violations of audit policies. Conclusion Security cannot be considered an afterthought in big data deployments. Across most organizations, Hadoop is managing sensitive data that must be protected; it is not simply crunching publicly available information used for search applications. The BDA provides a strong security foundation – ensuring users are only allowed to view authorized data and that data access is audited in a consolidated framework.

    Read the article

1 2  | Next Page >