Search Results

Search found 2017 results on 81 pages for 'hadoop streaming'.

Page 7/81 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • Hadoop write directory

    - by FaultyJuggler
    Simple question but reading through documentation and configuration I can't quite seem to figure it out. How do I A) know where hadoop is writing to on the local disk and B) change that For initial testing I setup HDFS on a 20gb linux VM - to it we've added a 500gb networked drive for moving towards prototyping the full system. So now how do I point HDFS at that drive, or do I simply move the home directory/install with some slight change in setup and restart the process?

    Read the article

  • Hadoop On Azure C#Isotope

    - by Sreesankar
    During the initial release of the HadoopOnAzure, Microsoft had provided the C#Isotope SDK as a programatic interface to the Hadoop cluster on the Azure. After the HDInsight release this is removed from the downoads. More over while trying with the previous version of the sdk we get a 500 - Internal Server Error. Any idea if this services is disabled? If so whats the alternative way to programatically interface with the HDInsight Cluster on Azure?

    Read the article

  • Efficient way to store a graph for calculation in Hadoop

    - by user337499
    I am currently trying to perform calculations like clustering coefficient on huge graphs with the help of Hadoop. Therefore I need an efficient way to store the graph in a way that I can easily access nodes, their neighbors and the neighbors' neighbors. The graph is quite sparse and stored in a huge tab separated file where the first field is the node from which an edge goes to the second node in field two. Thanks in advance!

    Read the article

  • Changing the block size of a dfs file in Hadoop

    - by Sam
    I found that my map tasks is currently inefficient when parsing one particular set of files (total 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster, does anyone know the command that would change the block size when I upload it (ie copy from local to the dfs)? Thanks!

    Read the article

  • Streaming audio from microphone to network

    - by Janusz
    I have the following problem: I want to stream the audio I record with one machine to another machine in the same network. It seems that vlc is the best shot at the moment. I was able to stream a music file via vlc but streaming the audio from the microphone the same doesn't work. EDIT If I enable play locally the captured sound is played. Even streaming to another instance of VLC on the same machine doesn't work.

    Read the article

  • How to monitor streaming servers

    - by pcdinh
    Hi all, I have had a bunch of Linux based streaming servers that employed lighttpd web server to provide video streaming via port 80. Recently, our service is very slow. Therefore, I would like to ask if there is a good software package that helps us monitor and record our bandwidth usage, lighttpd established connections, TCP sync connections, disk I/O ... over time. Any suggestions? Regards, Dinh

    Read the article

  • How to setup Hadoop cluster so that it accepts mapreduce jobs from remote computers?

    - by drasto
    There is a computer I use for Hadoop map/reduce testing. This computer runs 4 Linux virtual machines (using Oracle virtual box). Each of them has Cloudera with Hadoop (distribution c3u4) installed and serves as a node of Hadoop cluster. One of those 4 nodes is master node running namenode and jobtracker, others are slave nodes. Normally I use this cluster from local network for testing. However when I try to access it from another network I cannot send any jobs to it. The computer running Hadoop cluster has public IP and can be reached over internet for another services. For example I am able to get HDFS (namenode) administration site and map/reduce (jobtracker) administration site (on ports 50070 and 50030 respectively) from remote network. Also it is possible to use Hue. Ports 8020 and 8021 are both allowed. What is blocking my map/reduce job submits from reaching the cluster? Is there some setting that I must change first in order to be able to submit map/reduce jobs remotely? Here is my mapred-site.xml file: <configuration> <property> <name>mapred.job.tracker</name> <value>master:8021</value> </property> <!-- Enable Hue plugins --> <property> <name>mapred.jobtracker.plugins</name> <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value> <description>Comma-separated list of jobtracker plug-ins to be activated. </description> </property> <property> <name>jobtracker.thrift.address</name> <value>0.0.0.0:9290</value> </property> </configuration> And this is in /etc/hosts file: 192.168.1.15 master 192.168.1.14 slave1 192.168.1.13 slave2 192.168.1.9 slave3

    Read the article

  • Read files from directory to create a ZIP hadoop

    - by Félix
    I'm looking for hadoop examples, something more complex than the wordcount example. What I want to do It's read the files in a directory in hadoop and get a zip, so I have thought to collect al the files in the map class and create the zip file in the reduce class. Can anyone give me a link to a tutorial or example than can help me to built it? I don't want anyone to do this for me, i'm asking for a link with better examples than the wordaccount. This is what I have, maybe it's useful for someone public class Testing { private static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, BytesWritable> { // reuse objects to save overhead of object creation Logger log = Logger.getLogger("log_file"); public void map(LongWritable key, Text value, OutputCollector<Text, BytesWritable> output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); log.info("Doing something ... " + line); BytesWritable b = new BytesWritable(); b.set(value.toString().getBytes() , 0, value.toString().getBytes() .length); output.collect(value, b); } } private static class ReduceClass extends MapReduceBase implements Reducer<Text, BytesWritable, Text, BytesWritable> { Logger log = Logger.getLogger("log_file"); ByteArrayOutputStream bout; ZipOutputStream out; @Override public void configure(JobConf job) { super.configure(job); log.setLevel(Level.INFO); bout = new ByteArrayOutputStream(); out = new ZipOutputStream(bout); } public void reduce(Text key, Iterator<BytesWritable> values, OutputCollector<Text, BytesWritable> output, Reporter reporter) throws IOException { while (values.hasNext()) { byte[] data = values.next().getBytes(); ZipEntry entry = new ZipEntry("entry"); out.putNextEntry(entry); out.write(data); out.closeEntry(); } BytesWritable b = new BytesWritable(); b.set(bout.toByteArray(), 0, bout.size()); output.collect(key, b); } @Override public void close() throws IOException { // TODO Auto-generated method stub super.close(); out.close(); } } /** * Runs the demo. */ public static void main(String[] args) throws IOException { int mapTasks = 20; int reduceTasks = 1; JobConf conf = new JobConf(Prue.class); conf.setJobName("testing"); conf.setNumMapTasks(mapTasks); conf.setNumReduceTasks(reduceTasks); MultipleInputs.addInputPath(conf, new Path("/messages"), TextInputFormat.class, MapClass.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(BytesWritable.class); FileOutputFormat.setOutputPath(conf, new Path("/czip")); conf.setMapperClass(MapClass.class); conf.setCombinerClass(ReduceClass.class); conf.setReducerClass(ReduceClass.class); // Delete the output directory if it exists already JobClient.runJob(conf); } }

    Read the article

  • Chaining multiple MapReduce jobs in Hadoop.

    - by Niels Basjes
    In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps. I.e. Map1 , Reduce1 , Map2 , Reduce2 , etc. So you have the output from the last reduce that is needed as the input for the next map. The intermediate data is something you (in general) do not want to keep once the pipeline has been successfully completed. Also because this intermediate data is in general some data structure (like a 'map' or a 'set') you don't want to put too much effort in writing and reading these key-value pairs. What is the recommended way of doing that in Hadoop? Is there a (simple) example that shows how to handle this intermediate data in the correct way, including the cleanup afterward? Thanks.

    Read the article

  • OpenStreetMap and Hadoop

    - by portoalet
    Hi, I need some ideas for a weekend project about Hadoop and OpenStreetMap. I have access to AWS EC2 instance with OpenStreetMap snapshot in my EBS volume. The OpenStreetMap data is in a PostgreSQL database. What kind of MapReduce function can be run on the OpenStreetMap data, assuming I can export them into xml format, and then place into HDFS ? In other words, I am having a brain cramp at the moment, and cannot think what kind of MapReduce operation that can extract valuable insight from the OpenStreetMap xml? (i.e. extract all the places designated as park or golf course. But this needs to be done once only, not continuously) Many Thanks

    Read the article

  • Free Large datasets to experiment with Hadoop

    - by Sundar
    Do you know any large datasets to experiment with Hadoop which is free/low cost? Any pointers/links related is appreciated. Prefernce: Atleast one GB of data. Production log data of webserver. Few of them which I found so far: http://dumps.wikimedia.org/enwiki/20100130/ http://wiki.freebase.com/wiki/Data_dumps http://aws.amazon.com/publicdatasets/ Also can we run our own crawler to gather data from sites e.g. Wikipedia? Any pointers on how to do this is appreciated as well.

    Read the article

  • Free data warehouse - Infobright, Hadoop/Hive or what ?

    - by peperg
    I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to : store them securely use them to analysis (mostly time-oriented) retrieve some raw data occasionally It would be nice if it could be used with JasperReports or BIRT My first shot was Infobright Community - just a column-oriented, read-only storing mechanism for MySQL On the other hand, people says that NoSQL approach could be better. Hadoop+Hive looks promissing, but the documentation looks poor and the version number is less than 1.0 . I heard about Hypertable, Pentaho, MongoDB .... Do you have any recommendations ? (Yes, I found some topics here, but it was year or two ago)

    Read the article

  • How does Hadoop perform input splits?

    - by Deepak Konidena
    Hi, This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where k is the offset of the line from the beginning and value is the content of the line. Now, when we say that we want to run N map tasks, does the framework split the input file into N splits and run each map task on that split? or do we have to write a partitioning function that does the N splits and run each map task on the split generated? All i want to know is, whether the splits are done internally or do we have to split the data manually? More specifically, each time the map() function is called what are its Key key and Value val parameters? Thanks, Deepak

    Read the article

  • Help regarding no sql databases like hadoop, hbase etc

    - by user560370
    I am new to the distributed NoSQL databases like Hadoop, Cassandra, etc. I have few questions for which I seek an expert advice: Can you list problems/challenges one will generally face when making a shift from the present conventional database like MySQL to these large cluster-based databases? What are the difficulties, if any, when one needs to adapt to a newer version of these open source projects? Can you list out the things which are generally stored/kept in memcached for fast rendering of the page? How can I understand the source code of open-source projects so that I can build on it and maybe give back to the community? Above questions may sound to be idiotic and basic but please it's a request for the experts to answer the above questions in detailed and to best of their abilities.

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

  • Streaming audio to mobile phones, what technology to use ?

    - by Alx
    I'm planning on building an application where audio media is going to be streamed to the mobile phone for the user to listen. The targets are smartphones: iPhone/Blackberry/Android/(J2ME ?). I see that streaming on iPhone has to be done with HTTP Live streaming, but I don't see it supported by other platforms. Should I broadcast the streams via rstp ? http ? Is there any way to use a unified solution for all the different mobile platform ? If anyone already had to go through this, help would be gratly appreciated.

    Read the article

  • Hadoop/MapReduce: Reading and writing classes generated from DDL

    - by Dave
    Hi, Can someone walk me though the basic work-flow of reading and writing data with classes generated from DDL? I have defined some struct-like records using DDL. For example: class Customer { ustring FirstName; ustring LastName; ustring CardNo; long LastPurchase; } I've compiled this to get a Customer class and included it into my project. I can easily see how to use this as input and output for mappers and reducers (the generated class implements Writable), but not how to read and write it to file. The JavaDoc for the org.apache.hadoop.record package talks about serializing these records in Binary, CSV or XML format. How do I actually do that? Say my reducer produces IntWritable keys and Customer values. What OutputFormat do I use to write the result in CSV format? What InputFormat would I use to read the resulting files in later, if I wanted to perform analysis over them?

    Read the article

  • Need help implementing this algorithm with map Hadoop MapReduce

    - by Julia
    Hi all! i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I didnt want to post code so that it doesnt look i am searching for someone to implement it for me, but it is true i really need a lot of help!!! This was not planned for my project, but data set turned out to be huge, so teacher told me I have to do it like this. EDIT(i did not clarified i previos version)The data set I have is on a Hadoop cluster, and I should make its MapReduce implementation I was reading about MapReduce and thaught that i first do the standard implementation and then it will be more/less easier to do it with mapreduce. But didnt happen, since algorithm is quite stupid and nothing special, and map reduce...i cant wrap my mind around it. So here is shortly pseudo code of my algorithm LIST termList (there is method that creates this list from lucene index) FOLDER topFolder INPUT topFolder IF it is folder and not empty list files (there are 30 sub folders inside) FOR EACH sub folder GET file "CheckedFile.txt" analyze(CheckedFile) ENDFOR END IF Method ANALYZE(CheckedFile) read CheckedFile WHILE CheckedFile has next line GET line FOR(loops through termList) GET third word from line IF third word = term from list append whole line to string buffer ENDIF ENDFOR END WHILE OUTPUT string buffer to file Also, as you can see, each time when "analyze" is called, new file has to be created, i understood that map reduce is difficult to write to many outputs??? I understand mapreduce intuition, and my example seems perfectly suited for mapreduce, but when it comes to do this, obviously I do not know enough and i am STUCK! Please please help.

    Read the article

  • Microsoft’s 22tracks Music Service now Available in All Browsers

    - by Akemi Iwaya
    Are you tired of listening to the same old music and looking for something new to listen to? Then 22tracks from Microsoft is definitely worth a look! This online music service is available in your favorite browser, does not require an account to use, and lets you listen to music from multiple international sources! If you are curious about 22tracks, then the following excerpt and video sum up the service very nicely. From the blog post: The concept behind 22tracks is simple: 22 local top DJs from cities like Amsterdam, Brussels, London and Paris share their genre’s 22 hottest tracks of the moment. Each city boosts its own team of specialized DJs bringing you the newest tracks in their genre. When you get ready to select (or change to) another set of tracks, just click on the desired city at the top of the browser window, then click on the appropriate set from the drop-down list. 22tracks Homepage 22tracks and Internet Explorer team up to bring you a completely new online music experience [22tracks Blog] 22tracks about [YouTube] [via BetaNews and The Next Web]

    Read the article

  • Any good method for mounting Hadoop HDFS from another system?

    - by Beel
    I want to mount the Cloudera Hadoop as a Linux file system over the LAN. As a setup, I already have the hadoop cluster running on a set of Ubuntu machines. But now I need to be able to use it as a normal file system from a Fedora system over the LAN. I tried FUSe but two things: 1. Cloudera says FUSE loses data (click here for that comment by a Cloudera employee on the official Cloudera support site) 2. I've had no success making it work the way we want As a point of clarification, I am using Hadoop ONLY for the file system, not for its other capabilities.

    Read the article

  • If I intend to use Hadoop is there a difference in 12.04 LTS 64 Desktop and Server?

    - by Charles Daringer
    Sorry for such a Newbie Question, but I'm looking at installing M3 edition of MapR the requirements are at this link: http://www.mapr.com/doc/display/MapR/Requirements+for+Installation And my question is this, is the Desktop Kernel 64 for 12.04 LTS adequate or the "same" as the Server version of the product? If I'm setting up a lab to attempt to install a home cluster environment should I start with the Server or Dual Boot that distribution? My assumption is that the two are the same. That I can add any additional software to the 64 as needed. Can anyone elaborate on this? Have I missed something obvious?

    Read the article

  • Unable to Turn On Media Streaming in Windows Media Player 12 on Windows 7

    - by Chau Chee Yang
    I have 2 PC installed with Windows 7 and Media Player 12. I would like to use Play To feature on both PC connected via LAN. Both PC (A and B) run media player in standard user account. I able to turn on media streaming option in PC A (with privilege access prompt) without any problem. However, PC B also prompt privilege access but no response after enter administrator password. Both PC follow same configuration steps. I may use "play to" PC A (in standard user account) from other PC without any problem. But I can't "play to" PC B in standard user account. I can only run media player in administrator account for "play to" to function. I have tried uninstall and reinstall media player via "Programs and Features" in control panel on PC B. However, it doesn't work too. Does anyone has similar experience as me failing to turn on media streaming that running Windows media player in standard user account?

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >