Search Results

Search found 401 results on 17 pages for 'hadoop'.

Page 3/17 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • hadoop beginners question

    - by Omnipresent
    I've read some documentation about hadoop and seen the impressive results. I get the bigger picture but am finding it hard whether it would fit our setup. Question isnt programming related but I'm eager to get opinion of people who currently work with hadoop and how it would fit our setup: We use Oracle for backend Java (Struts2/Servlets/iBatis) for frontend Nightly we get data which needs to be summarized. this runs as a batch process (takes 5 hours) We are looking for a way to cut those 5 hours to a shorter time. Where would hadoop fit into this picture? Can we still continue to use Oracle even after hadoop?

    Read the article

  • Project Idea with Hadoop MapReduce

    - by Aditya Andhalikar
    Hello, I learnt Hadoop a few months back and managed to do a very introductory programming project on it. I want to do a small - medium sized project or series of small programming assignments with Hadoop. I have seen lot of ideas around but I dont see anything that can be finished in about 60-70 hours of work so a pretty small scale project as I want to do that in my spare time along with other studies. Most project ideas I have seen sort of large to go on for 2-3 months. My main objective out of this exercise to develop good expertise in programming with Hadoop environment not to do any research or solve specific problems. I see Hadoop being used lot of with webservices maybe that would be an interesting track for small projects. Thank you in advance. Regards, Aditya

    Read the article

  • Hadoop reduce task gets hung

    - by user806098
    I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, it's that the reduce task fails to fetch map output from map nodes. The job tracker log of master shows messages like this: 2011-06-27 19:55:14,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201106271953_0001_r_000000_0' to tip task_201106271953_0001_r_000000, for tracker 'tracker_web30.bbn.com.cn:localhost/127.0.0.1:56476' And the name node log of master shows messages like this: 2011-06-27 14:00:52,898 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call register(DatanodeRegistration(202.106.199.39:50010, storageID=DS-1989397900-202.106.199.39-50010-1308723051262, infoPort=50075, ipcPort=50020)) from 192.168.225.19:16129: error: java.io.IOException: verifyNodeRegistration: unknown datanode 202.106.199.3 9:50010 However, neither the "web30.bbn.com.cn" or 202.106.199.39, 202.106.199.3 is the slave node. I think such ip/domains appear because hadoop fails to resolve a node(first in the Intranet DNS server), then it goes to a higher-level DNS server, later to the top, still fails, then the "junk" ip/domains are returned. But I checked my config, it goes like this: /etc/hosts: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.225.16 master 192.168.225.66 slave1 192.168.225.20 slave5 192.168.225.17 slave17 conf/core-site.xml: hadoop.tmp.dir /root/hadoop_tmp/hadoop_${user.name} fs.default.name hdfs://master:54310 io.sort.mb 1024 hdfs-site.xml: dfs.replication 3 masters: master slaves: master slave1 slave5 slave17 Also, all firewalls(iptables) are turned off, and ssh between each 2 nodes is ok. so I don't know where exact the error comes from. Please help. Thanks a lot.

    Read the article

  • How to learn using Hadoop

    - by ajay
    hi, I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it properly. Would it be helpful to run multiple linux VMs and then use them as boxes to run hadoop? Or you think that is more of a stretch and running it on a multiple hosts is the same as running in on single host (in terms of setup, Hadoop API used, the architecture of the map-reduce programs etc) Thanks,

    Read the article

  • Configuring Hadoop logging to avoid too many log files

    - by Eric Wendelin
    I'm having a problem with Hadoop producing too many log files in $HADOOP_LOG_DIR/userlogs (the Ext3 filesystem allows only 32000 subdirectories) which looks like the same problem in this question: http://stackoverflow.com/questions/2091287/error-in-hadoop-mapreduce My question is: does anyone know how to configure Hadoop to roll the log dir or otherwise prevent this? I'm trying to avoid just setting the "mapred.userlog.retain.hours" and/or "mapred.userlog.limit.kb" properties because I want to actually keep the log files. I was also hoping to configure this in log4j.properties, but looking at the Hadoop 0.20.2 source, it writes directly to logfiles instead of actually using log4j. Perhaps I don't understand how it's using log4j fully. Any suggestions or clarifications would be greatly appreciated.

    Read the article

  • Typical Hadoop setup for remote job submission

    - by Artii
    So I am still a bit new to hadoop and am currently in the process of setting up a small test cluster on Amazonaws. So my question relates to some tips on the structuring of the cluster so it is possible to work submit jobs from remote machines. Currently I have 5 machines. 4 are basically the Hadoop cluster with the NameNodes, Yarn etc. One machine is used as a manager machine( Cloudera Manager). I am gonna describe my thinking process on the setup and if anyone can chime in the points I am not clear with, that would be great. I was thinking what was the best setup for a small cluster. So I decided to expose only one manager machine and probably use that to submit all the jobs through it. The other machines will see each other etc, but not be accessible from the outside world. I am have conceptual idea on how to do this,but I am not sure how to properly go about doing this though, if anyone could point me in the right direction that would great. Also another big point is, I want to be able to submit jobs to the cluster through exposed machine from a client machine (might be Windows). I am not so clear on this setup as well. Do I need to have Hadoop installed on the machine in order to use the normal hadoop commands, and to write/submit jobs say from Eclipse or something similar. So to sum it up my questions are, Is this an ok setup for a small test cluster How can I go about using one exposed machine to submit/route jobs to the cluster, without having any of the Hadoop nodes on it. How do I setup a client machine to submit jobs to a remote cluster, and an example on how to do it on Windows. Also if there are any reason not to use Windows as a client machine in this setup. Thanks I would greatly appreciate any advice or help on this.

    Read the article

  • Hadoop Map Reduce job never finishes

    - by rohanbk
    I am running a Hadoop Map Reduce job using a Python Mapper and Reducer script, and Hadoop Streaming. Both my Map and Reduce jobs run till they are both 100%, but the job doesn't end. I know that when things go sour, Hadoop will terminate the job, but in this case, both stages reach a 100% and just never end. Has anyone else encountered anything similar? Also, how do I debug my program to figure out where things are going wrong? If I use a smaller input file, and I just run something like: $> cat input_file | mapper.py | sort | reduce.py >> output_file everything works perfectly fine. However, when I use Hadoop, things don't work out.

    Read the article

  • Any tested Frameworks/Solutions similar to Apache Hadoop?

    - by andreas
    Hello, I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there. Appreciate any information/links to projects similar to Apache Hadoop and any comments on the Apache Hadoop project from anyone that has used it. Regards,

    Read the article

  • MySQL and Hadoop Integration - Unlocking New Insight

    - by Mat Keep
    “Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes. As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL. The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to analysis and visualisation / decision, as shown in the figure below The Guide details each of these stages and the technologies supporting them: Acquire: Through new NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop. Organize: Data is transferred from MySQL tables to Hadoop using Apache Sqoop. With the MySQL Binlog (Binary Log) API, users can also invoke real-time change data capture processes to stream updates to HDFS. Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform. Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they inform real-time operational processes or provide source data for BI analytics tools. So how are companies taking advantage of this today? As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted. This knowledge can be combined with user profiles and purchasing history to gain a better understanding of customers, and the delivery of highly targeted offers. Of course, it is not just in the web that big data can make a difference. Every business activity can benefit, with other common use cases including: - Sentiment analysis; - Marketing campaign analysis; - Customer churn modeling; - Fraud detection; - Research and Development; - Risk Modeling; - And more. As the guide discusses, Big Data is promising a significant transformation of the way organizations leverage data to run their businesses. MySQL can be seamlessly integrated within a Big Data lifecycle, enabling the unification of multi-structured data into common data platforms, taking advantage of all new data sources and yielding more insight than was ever previously imaginable. Download the guide to MySQL and Hadoop integration to learn more. I'd also be interested in hearing about how you are integrating MySQL with Hadoop today, and your requirements for the future, so please use the comments on this blog to share your insights.

    Read the article

  • Amazon Elastic MapReduce: Exception from FileSystem

    - by S.N
    Hi, I run my application using ruby client: ruby elastic-mapreduce -j j-20PEKMT9BRSUC --jar s3n://sakae55/lib/edu.cit.som.jar --main-class edu.cit.som.hadoop.SOMDriver --arg s3n://sakae55/repository/input/ecoli/ --arg s3n://sakae55/repository/output/ecoli/pl/ --arg s3n://sakae55/repository/data/ecoli/som.txt Then, I am seeing the following error: java.lang.IllegalArgumentException: This file system object (file:///) does not support access to the request path 'hdfs://i -10-195-207-230.ec2.internal:9000/mnt/var/lib/hadoop/tmp/mapred/system/job_201004221221_0017/job.jar' You possibly called Fi eSystem.get(conf) when you should of called FileSystem.get(uri, conf) to obtain a file system supporting your path. at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:416) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:259) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:676) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:200) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1184) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1160) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1132) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:662) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:729) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026) at edu.cit.som.hadoop.SOMDriver.runIteration(SOMDriver.java:106) at edu.cit.som.hadoop.SOMDriver.train(SOMDriver.java:69) at edu.cit.som.hadoop.SOMDriver.run(SOMDriver.java:52) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at edu.cit.som.hadoop.SOMDriver.main(SOMDriver.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) I am not sure why the error references to "file:///" even though all the arguments I pass do not use the schema.

    Read the article

  • How a .NET Programmer learn Big Data/Hadoop? [on hold]

    - by Smith Pascal Jr.
    I have been ASP.NET developer for sometime now and I have been reading a lot about Big Data- Hadoop and its future as to how it is the next technology in IT and how it would be useful to create million of jobs in US and elsewhere in the world. Now since Hadoop is an open source big data tool which is managed by Apache Server Foundation Group, I'm assuming I have to be well aware of JAVA - Correct me if I'm wrong. Moreover, How a .NET programmer can learn Big Data and its related technologies and can work professionally full time into this technology? What challenges and opportunities does a .NET professional face while changing the technology platform? Please advice. Thanks

    Read the article

  • Hive metadata permission issue

    - by Chandramohan
    We are getting this error on Hive, while creating a DB / table hive> CREATE TABLE pokes (foo INT, bar STRING); FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. NestedThrowables: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Hive log : org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:298) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:601) at org.datanucleus.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:286) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:182) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1953) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:234) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:261) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:196) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:171) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:354) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:306) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:451) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:232) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:108) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:470) ... 15 more Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:521) at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:290) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:588) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:300) at org.datanucleus.ObjectManagerFactoryImpl.initialiseStoreManager(ObjectManagerFactoryImpl.java:161) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:583) ... 42 more Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection. at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1191) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 52 more 2011-08-11 18:02:36,964 ERROR ql.Driver (SessionState.java:printError(343)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

    Read the article

  • ClassNotFoundException error in implementing Bayesian algorithm in Apache Mahout on Hadoop

    - by Shweta
    Hi, I have a problem in executing the Bayesian algorithm in Mahout. I built it with Maven and the job file is in target directory. When run from terminal using hadoop, I'm getting the ClassNotFoundException error. What should be done? $HADOOP_HOME/bin/hadoop jar mahout-core-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver -i test -o output Exception in thread "main" java.lang.ClassNotFoundException: org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

    Read the article

  • Hadoop on windows server

    - by Luca Martinetti
    Hello, I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice on windows? I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#? What do you use for submitting and monitoring the jobs? Thanks

    Read the article

  • Why isn't Hadoop implemented using MPI?

    - by artif
    Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes. What are the technical reasons for this? I could hazard a few guesses, but I do not know enough of how MPI is implemented "under the hood" to know whether or not I'm right. Come to think of it, I'm not entirely familiar with Hadoop's internals either. I understand the framework at a conceptual level (map/combine/shuffle/reduce and how that works at a high level) but I don't know the nitty gritty implementation details. I've always assumed Hadoop was transmitting serialized data structures (perhaps GPBs) over a TCP connection, eg during the shuffle phase. Let me know if that's not true.

    Read the article

  • Hadoop is not able to find JAVA_HOME properly

    - by Shekhar
    I am trying to run hadoop my Ubuntu OS. I have set JAVA_HOME variable in ~/.bashrc file to /usr/lib/jvm/jdk1.7.0_01/ but when I run hadoop namenode -format command it fails with following errors : shekhar@ubuntu:/usr$ hadoop namenode -format Warning: $HADOOP_HOME is deprecated. /host/Shekhar/Softwares/hadoop-1.0.0/bin/hadoop: line 321: /usr/jdk1.7.0_01/bin/java: No such file or directory /host/Shekhar/Softwares/hadoop-1.0.0/bin/hadoop: line 387: /usr/jdk1.7.0_01/bin/java: No such file or directory hadoop tries to locate java command at /usr/jdk1.7.0_01/bin/ path. Clearly somehow it missed /lib/jvm folder. I am not able to understand why and how this is happening. my echo $PATH command gives following output : shekhar@ubuntu:/usr$ echo $PATH /usr/lib/jvm/jdk1.7.0_01/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/jdk1.7.0_01/bin:/host/Shekhar/Softwares/hadoop-1.0.0/bin If I run which java command I get following output : shekhar@ubuntu:/usr$ which java /usr/lib/jvm/jdk1.7.0_01/bin/java and echo $JAVA_HOME returns following output : shekhar@ubuntu:/usr$ echo $JAVA_HOME /usr/lib/jvm/jdk1.7.0_01 I would like to know why hadoop is taking JAVA_HOME path incorrectly. Please help...

    Read the article

  • Hadoop streaming job : stuck

    - by Algorist
    Hi, I am running a hadoop streaming job. It got stuck due to no reason. I am not sure how to cancel the task, so that hadoop schedules another task for the same job. I tried killing the job, but it still doesn't work. Anyone know, how to do this? Thank you Bala

    Read the article

  • Repository organization for Hadoop project

    - by Alex N.
    I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way of organizing the repository structure? Should each job live in separate repo or would it be more efficient to keep them in the same, but break down into folders?

    Read the article

  • Managing dependencies with Hadoop Streaming?

    - by beagleguy
    hi all, had a quick hadoop streaming question.. If I'm using python streaming and I have python packages my mappers/reducers require that aren't installed by default do I need to install those on all the hadoop machines as well or is there some sort of serialization that sends them to the remote machines? thanks!

    Read the article

  • Hadoop: Processing large serialized objects

    - by restrictedinfinity
    I am working on development of an application to process (and merge) several large java serialized objects (size of order GBs) using Hadoop framework. Hadoop stores distributes blocks of a file on different hosts. But as deserialization will require the all the blocks to be present on single host, its gonna hit the performance drastically. How can I deal this situation where different blocks have to cant be individually processed, unlike text files ?

    Read the article

  • can i use hadoop cloudera without root access?

    - by in_the_cloud
    a bit of a binary question (okay, not excatly) - but was wondering if one is able to configure cloudera / hadoop to run at the nodes without root shell access to the node computers (although i can setup ssh passwordless login)? appears from their instructions that root access is needed, at yet i found a hadoop wiki which suggest root access might not be needed ? http://wiki.apache.org/nutch/NutchHadoopTutorial

    Read the article

  • Error when running a basic Hadoop code

    - by Abhishek Shivkumar
    I am running a hadoop code that has a partitioner class inside the job. But, when I run the command hadoop jar Sort.jar SecondarySort inputdir outputdir I am getting a runtime error that says class KeyPartitioner not org.apache.hadoop.mapred.Partitioner. I have ensured that the KeyPartitioner class has extended the Partitioner class, but why am I getting this error? Here is the driver code: JobConf conf = new JobConf(getConf(), SecondarySort.class); conf.setJobName(SecondarySort.class.getName()); conf.setJarByClass(SecondarySort.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.setMapOutputKeyClass(StockKey.class); conf.setMapOutputValueClass(Text.class); conf.setPartitionerClass((Class<? extends Partitioner<StockKey, DoubleWritable>>) KeyPartitioner.class); conf.setMapperClass((Class<? extends Mapper<LongWritable, Text, StockKey, DoubleWritable>>) StockMapper.class); conf.setReducerClass((Class<? extends Reducer<StockKey, DoubleWritable, Text, Text>>) StockReducer.class); and here is the code of the partitioner class: public class KeyPartitioner extends Partitioner<StockKey, Text> { @Override public int getPartition(StockKey arg0, Text arg1, int arg2) { int partition = arg0.name.hashCode() % arg2; return partition; } }

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >