Search Results

Search found 3 results on 1 pages for 'grautur'.

Page 1/1 | 1 

  • Why do HDFS clusters have only a single NameNode?

    - by grautur
    I'm trying to understand better how Hadoop works, and I'm reading The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy. Hadoop 0.21+ has a BackupNameNode that is part of a plan to have an HA name service, but it needs active contributions from the people who want it (i.e. you) to make it Highly Available. from http://wiki.apache.org/hadoop/NameNode So why is the NameNode a single point of failure? What is bad or difficult about having a complete duplicate of the NameNode running as well?

    Read the article

  • "ssh_exchange_identification: Connection closed by remote host lost connection" when running cron job

    - by grautur
    I have a Ruby script that connects to a remote machine via ssh and executes a command. The script runs fine when I just run it in my terminal. In my crontab, I have 1 * * * * /bin/bash -l -c 'ruby myfile.rb' and if I go ahead and run /bin/bash -l -c 'ruby myfile.rb', everything executes fine. But when cron itself executes the job, I get a ssh_exchange_identification: Connection closed by remote host error. What's the cause of this? How do I fix it?

    Read the article

  • Relational vs. Dimensional Databases, what's the difference?

    - by grautur
    I'm trying to learn about OLAP and data warehousing, and I'm confused about the difference between relational and dimensional modeling. Is dimensional modeling basically relational modeling, but allowing for redundant/un-normalized data? For example, let's say I have historical sales data on (product, city, # sales). I understand that the following would be a relational point-of-view: Product | City | # Sales Apples, San Francisco, 400 Apples, Boston, 700 Apples, Seattle, 600 Oranges, San Francisco, 550 Oranges, Boston, 500 Oranges, Seattle, 600 While the following is a more dimensional point-of-view: Product | San Francisco | Boston | Seattle Apples, 400, 700, 600 Oranges, 550, 500, 600 But it seems like both points of view would nonetheless be implemented in an identical star schema: Fact table: Product ID, Region ID, # Sales Product dimension: Product ID, Product Name City dimension: City ID, City Name And it's not until you start adding some additional details to each dimension that the differences start popping up. For instance, if you wanted to track regions as well, a relational database would tend to have a separate region table, in order to keep everything normalized: City dimension: City ID, City Name, Region ID Region dimension: Region ID, Region Name, Region Manager, # Regional Stores While a dimensional database would allow for denormalization to keep the region data inside the city dimension, in order to make it easier to slice the data: City dimension: City ID, City Name, Region Name, Region Manager, # Regional Stores Is this correct?

    Read the article

1