Search Results

Search found 9449 results on 378 pages for 'big marc'.

Page 12/378 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • Join multiple filesystems (on multiple computers) into one big volume

    - by jm666
    Scenario: Have 10 computers, each have 12x2TB HDDs (currently) in raidZ2 (10+2) configuration, so, in the each computer i have one approx. 20TB volume. Now, need those 10 separate computers (separate raid groups) join into one big volume. What is the recommended solution? I'm thinking about the FCoE (10GB ethernet). So, buying into each computer FCoE (10GB ethernet card) and - what need more on the hardware side? (probably another computer, FCoE switch? like Cisco Nexus?) The main question is: what need to install and configure on each computer? Currently they have freebsd/raidz2, but it is possible change it into Linux/Solaris if needed. Any helpful resource what talking about how to build a big volumes from smaller raid-groups (on the software side) is very welcomed. So, what OS, what filesystem, what software - etc. In short: want get one approx. 200TB storage (in one filesystem) from already existing computers/storage. Don't need fast writes, but need good performance on reading data. (as a big fileserver), what will works transparently, so when storing data don't want care about onto what computer the data goes. (e.g. not 10 mountpoints - but one big logical filesystem). Thanks.

    Read the article

  • SQL – Quick Start with Admin Sections of NuoDB – Manage NuoDB Database

    - by Pinal Dave
    In the yesterday’s blog post we have seen that it is extremely easy to install the NuoDB database on your local machine. Now that the application is properly set up, let us explore NuoDB a bit more and get you familiar with the how it works and what the important areas of the NuoDB are that you should learn. As we have already installed NuoDB, now we will quickly start with two of the important areas in NuoDB: 1) Admin and 2) Explorer. In this blog post I will explore how the Admin Section of the NuoDB Console works.  In the next blog post we will learn how the Explorer Section works. Let us go to the NuoDB Console by typing the following URL in your browser: http://localhost:8080/ It will bring you to the following screen: On this screen you can see a big Start QuickStart button. Click on the button and it will bring you to following screen. On this screen you will find very important information about Domain and Database Settings. It is our habit that we do not read what is written on the screen and keep on clicking on continue without reading. While we are familiar with most wizards, we can often miss the very important message on the screen. Please note the information of Domain Settings and Database Settings from the following screen before clicking on Create Database. Domain Settings User: quickstart Password: quickstart Database Settings User: dba Password: goalie Database: test Schema: HOCKEY Once you click on the Create Database button it will immediately start creating sample database. First, it will start a Storage Manager and right after that it will start a Transaction Engine. Once the engine is up, it will Create a Schema and Sample Data. On the success of the creating the sample database it will show the following screen. Now is the time where we can explore the NuoDB Admin or NuoDB Explorer. If you click on Admin, it will first show following login screen. Enter for the username “domain” and for the password “bird”. Alternatively you can enter “quickstart”  twice for username and password.  It works as too. Once you enter into the Admin Section, on the left side you can see information about NuoDB and Admin Console and on the right side you can see the domain overview area. From this Administrative section you can do any of the following tasks: Create a view of the entire domain Add and remove databases Start and stop NuoDB Transaction Engines and Storage Managers Monitor transaction across all the NuoDB databases On the right side of the Admin Section we can see various information about a particular NuoDB domain. You can quickly view various alerts, find out information about the number of host machines that are provisioned for the domain, and see the number of databases and processes that are running in the domain. If you click on the “1 host” link you will be able to see various processes, CPU usage and other information. In the Processes Section you can see that there are two different types of processes. The first process (where you can see the floppy drive icon) represents a running Storage Manager process and the second process a running Transaction Engine process. You can click on the links for the Storage Manager and Transaction Engine to see further statistical details right down to the last byte of the data. There are various charts available for analysis as well. I think the product is quite mature and the user can add different monitor charts to the Admin section. Additionally, the Admin section is the place where you can create and manage new databases. I hope today’s tutorial gives you enough confidence that you can try out NuoDB and checkout various administrative activities with the database. I am personally impressed with their dashboard related to various counters. For more information about how the NuoDB architecture works and what a Storage Manager or Transaction Engine does, check out this short video with NuoDB CTO Seth Proctor:  In the next blog post, we will try out the Explorer section of NuoDB, which allows us to run SQL queries and write SQL code.  Meanwhile, I strongly suggest you download and install NuoDB and get yourself familiar with the product. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Does my AMD-based machine use little endian or big endian?

    - by Frank
    I'm going though a computers system course and I'm trying to establish, for sure, if my AMD based computer is a little endian machine? I believe it is because it would be Intel-compatible. Specifically, my processor is an AMD 64 Athlon x2. I understand that this can matter in C programming. I'm writing C programs and a method I'm using would be affected by this. I'm trying to figure out if I'd get the same results if I ran the program on an Intel based machine (assuming that is little endian machine). Finally, let me ask this: Would any and all machines capable of running Windows (XP, Vista, 2000, Server 2003, etc) and, say, Ubuntu Linux desktop be little endian? Thank You, Frank

    Read the article

  • Help with algorithmic complexity in custom merge sort implementation

    - by bitcycle
    I've got an implementation of the merge sort in C++ using a custom doubly linked list. I'm coming up with a big O complexity of n^2, based on the merge_sort() slice operation. But, from what I've read, this algorithm should be n*log(n), where the log has a base of two. Can someone help me determine if I'm just determining the complexity incorrectly, or if the implementation can/should be improved to achieve n*log(n) complexity? If you would like some background on my goals for this project, see my blog. I've added comments in the code outlining what I understand the complexity of each method to be. Clarification - I'm focusing on the C++ implementation with this question. I've got another implementation written in Python, but that was something that was added in addition to my original goal(s).

    Read the article

  • How does Google store search trends in backend?

    - by Achshar
    Google trends shows what query has been searched how many times and some other properties of the said query. But how is this data stored in a database? Storing a new row for every search does not seem right. They also tell the query on a time graph, so they must have some way to look for individual searches made by users, but the number of queries they get every day, it does not feel right that they would store every search in a database row along with a time-stamp. This does not apply to just Google trends or Google in general but any other big site that gets awful number of queries and then has tools to see them in depth. I am not an expert on this but I am interested to know some high level structure of how things work behind the scenes.

    Read the article

  • Was a Big Fish in a Little Pond, Am Now a Little Fish in a Big Pond. How Do I Grow? [closed]

    - by Ziv
    I've finished high school where I was in the top three in my class, I studied a little and there too I was pretty much Big Fish in a bigger pond than high school. Now I got into my first job in a very big company, there are some incredibly talented programmers and researchers here (mostly in departments not related to mine) and for the first time I really feel like I'm incredibly average - I do not want to be average. I read technical books all the time, I try to code on my personal time but I don't feel like that's enough. What can I do to become a leading programmer again in this big company? Is there anything specifically that can be done to make myself known here? This is a very big company so in order to advance you must be very good and shine in your field.

    Read the article

  • Git on DreamHost still balking on big files even after I compiled with NO_MMAP=1

    - by fuzzy lollipop
    I compiled Git 1.7.0.3 on DreamHost with the NO_MMAP=1 option, I also supplied that option when I did the "make NO_MMAP=1 install". I have my paths set up correctly, which git reports my ~/bin dir which is correct, git --version returns the correct version. But when I try to do a "git push origin master" with "big" files ~150MB it always fails. Does anyone have an suggestions on how to get DreamHost to accept this "big" files from a git push?

    Read the article

  • Willy Rotstein on Analytics and Social Media in Retail

    - by sarah.taylor(at)oracle.com
    Recently I came across a presentation from Dan Zarrella on "The Science of Retweets. (http://www.slideshare.net/HubSpot/the-science-of-retweets-with-dan-zarrella). It is an insightful, fact-based analysis of how tweets propagate and what makes them successful. The analysis is of course very interesting for those of us interested Tweeting. However, what really caught my attention is how well it illustrates, form a very different angle, some of the issues I am discussing with retailers these days. In particular the opportunities that e-commerce and social media open to those retailers with the appetite and vision to tackle the associated analytical challenges. And these challenges are of course not straightforward.   In his presentation Dan introduces the concept of Observability, I haven't had the opportunity to discuss with Dan his specific definition for the term. However, in practical retail terms, I would say that it means that through social media (and other web channels such as search) we can analyze and track processes by measuring Indicators that were not measurable before. The focus is in identifying patterns across a large number of consumers rather than what a particular individual "Likes".   The potential impact for retailers is huge. It opens the opportunity to monitor changes in consumer preference  and plan the business accordingly. And you can do this almost "real time" rather than through infrequent surveys that provide a "rear view" picture of your consumer behaviour. For instance, you could envision identifying when a particular set of fashion styles are breaking out from the pack, and commit a re-buy. Or you could monitor when the preference for a specific mobile device has declined and hence markdowns should be considered; or how demand for a specific ready-made food typically flows across regions and manage the inventory accordingly. Search, blogging, website and store data may need to be considered in identifying these trends. The data volumes involved are huge (check Andrea Morgan's recent post on "Big Data" in retail) but so are the benefits. As Andrea says, for the first time we can start getting insight into "Why" the business is performing in a certain way rather than just reporting on what is happening. And it is not just about the data volumes. Tackling the challenge also calls for integrated planning systems that can bring data and insight into the context of the Decision Making process Buyers, Merchandisers and Supply Chain managers are following. I strongly believe that only when data and process come together you can move from the anecdotal to systematically improving business performance.   I would love to hear your opinions on these trends and where you think Retail is heading to exploit these topics - please email me: [email protected]

    Read the article

  • SQL – Contest to Get The Date – Win USD 50 Amazon Gift Cards and Cool Gift

    - by Pinal Dave
    If you are a regular reader of this blog – you will find no issue at all in resolving this puzzle. This contest is based on my experience with NuoDB. If you are not familiar with NuoDB, here are few pointers for you. Step by Step Guide to Download and Install NuoDB – Getting Started with NuoDB Quick Start with Admin Sections of NuoDB – Manage NuoDB Database Quick Start with Explorer Sections of NuoDB – Query NuoDB Database In today’s contest you have to answer following questions: Q 1: Precision of NOW() What is the precision of the NuoDB’s NOW() function, which returns current date time? Hint: Run following script on NuoDB Console Explorer section: SELECT NOW() AS CurrentTime FROM dual; Here is the image. I have masked the area where the time precision is displayed. Q 2: Executing Date and Time Script When I execute following script - SELECT 'today' AS Today, 'tomorrow' AS Tomorrow, 'yesterday' AS Yesterday FROM dual; I will get the following result:   NOW – What will be the answer when we execute following script? and WHY? SELECT CAST('today' AS DATE) AS Today, CAST('tomorrow' AS DATE) AS Tomorrow, CAST('yesterday'AS DATE) AS Yesterday FROM dual; HINT: Install NuoDB (it takes 90 seconds). Prizes: 2 Amazon Gifts 2 Limited Edition Hoodies (US resident only)   Rules: Please leave an answer in the comments section below. You must answer both the questions together in a single comment. US resident who wants to qualify to win NuoDB apparel please mention your country in the comment. You can resubmit your answer multiple times, the latest entry will be considered valid. Last day to participate in the puzzle is June 24, 2013. All valid answer will be kept hidden till June 24, 2013. The winner will be announced on June 25, 2013. Two Winners will get USD 25 worth Amazon Gift Card. (Total Value = 25 x 2 = 50 USD) The winner will be selected using a random algorithm from all the valid answers. Anybody with a valid email address can take part in the contest. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • SQL – Biggest Concerns in a Data-Driven World

    - by Pinal Dave
    The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Welcome to the Oracle Retail International Blog

    - by sarah.taylor(at)oracle.com
    Welcome to the first post of the new Oracle Retail International Blog. Retail is an international business and today's successful retailers view themselves in the context of a global market. A niche fashion business in Tokyo will learn marketing strategies from the luxury brands of Milan, an independent grocer in Oslo will source the same global brands as a supermarket in Oklahoma, and every retailer in the world will measure their multi-channel operation against the international e-commerce giant Amazon.  Why? Because today's customer is a global customer with unparalleled expectations on choice, price and service. Today's consumers have access to more information on retail than ever before. Technology allows people to shop from their home, their office or from the phone in their pocket, wherever they are and at whatever time suits them. Customers are using the web to search for products and promotions. They are also using the web to develop their voice in commenting on products and services that have delighted or disappointed. In an information rich industry, this customer element creates a new world of data. The best retailers are developing eagle eyes for reading customer activity and turning it into profitable decisions. Ultimately, whether you choose to compete or shop on price, service, product innovation, excellent operations or all of the above - the international world of retail has become an inspiration for all - retailer and consumer alike.  Retail as an industry is growing and diversifying at a faster rate than ever before. Yet it is still the customer who picks the winners and the losers on the retail field. Economic circumstances transform the rules, but it is still the customer who dictates the game, the pace, the price, and the perception of the brand. Wise retailers never rest on their laurels. They are always shopping for ideas on how to improve and differentiate the offer at every touch point to meet the customer's needs better than anyone else and to gain each customer's loyalty at a time when loyalty can be cheap. With this blog, I hope that we might provide a hub for discussion around what unifies retail and how technology supports both the retailer and customer experience. Despite the competitive nature of this market, we hope that this will provide an opportunity to share experiences and lessons learnt with a view that knowledge can only help this industry to grow and develop. At Oracle we've been supporting retailers for many years. Many of us have worked within retail organisations all over the world, myself included. With this in mind, I don't feel it is too bold a statement to say that Oracle understands retail. We wouldn't be so heavily integrated in some of the biggest and most well-known names in retail if we didn't. With this blog, we intend to create a community of international retailers that can exchange ideas and experiences, debate collective challenges and drive a better understanding of this continually evolving industry. Events such as the World Retail Congress and NRF's Big Show bring enormous value to the retail industry providing platforms for discussion and learning but they happen once a year. We wanted to create a platform for discussion on a different level and that like retail, is always on. We hope not only to bring commitment to being not only the infrastructure that brings all of their systems together within a retail business, but an infrastructure that supports the industry internationally to grow and flourish through creating a platform for networking, discussion, creativity, vision and strategy. Please feel free to ask questions or comment using the comments functionality.  You might also want to visit our other Oracle Retail social media sites: Facebook - http://www.facebook.com/oracleretail YouTube - http://www.youtube.com/user/oracleretail Twitter - http://twitter.com/#!/oracleretailInsight-Driven Retailing Blog - http://blogs.oracle.com/retail/

    Read the article

  • SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database

    - by Pinal Dave
    This is the third post in the series of the blog posts I am writing about NuoDB. NuoDB is very innovative and easy-to-use product. I can clearly see how one can scale-out NuoDB with so much ease and confidence. In my very first blog post we discussed how we can install NuoDB (link), and in my second post I discussed how we can manage the NuoDB database transaction engines and storage managers with a few clicks (link). Note: You can Download NuoDB from here. In this post, we will learn how we can use the Explorer feature of NuoDB to do various SQL operations. NuoDB has a browser-based Explorer, which is very powerful and has many of the features any IDE would normally have. Let us see how it works in the following step-by-step tutorial. Let us go to the NuoDBNuoDB Console by typing the following URL in your browser: http://localhost:8080/ It will bring you to the QuickStart screen. Make sure that you have created the sample database. If you have not created sample database, click on Create Database and create it successfully. Now go to the NuoDB Explorer by clicking on the main tab, and it will ask you for your domain username and password. Enter the username as a domain and password as a bird. Alternatively you can also enter username as a quickstart and password as a quickstart. Once you enter the password you will be able to see the databases. In our example we have installed the Sample Database hence you will see the Test database in our Database Hierarchy screen. When you click on database it will ask for the database login. Note that Database Login is different from Domain login and you will have to enter your database login over here. In our case the database username is dba and password is goalie. Once you enter a valid username and password it will display your database. Further expand your database and you will notice various objects in your database. Once you explore various objects, select any database and click on Open. When you click on execute, it will display the SQL script to select the data from the table. The autogenerated script displays entire result set from the database. The NuoDB Explorer is very powerful and makes the life of developers very easy. If you click on List SQL Statements it will list all the available SQL statements right away in Query Editor. You can see the popup window in following image. Here is the cool thing for geeks. You can even click on Query Plan and it will display the text based query plan as well. In case of a SELECT, the query plan will be much simpler, however, when we write complex queries it will be very interesting. We can use the query plan tab for performance tuning of the database. Here is another feature, when we click on List Tables in NuoDB Explorer.  It lists all the available tables in the query editor. This is very helpful when we are writing a long complex query. Here is a relatively complex example I have built using Inner Join syntax. Right below I have displayed the Query Plan. The query plan displays all the little details related to the query. Well, we just wrote multi-table query and executed it against the NuoDB database. You can use the NuoDB Admin section and do various analyses of the query and its performance. NuoDB is a distributed database built on a patented emergent architecture with full support for SQL and ACID guarantees.  It allows you to add Transaction Engine processes to a running system to improve the performance of your system.  You can also add a second Storage Engine to your running system for redundancy purposes.  Conversely, you can shut down processes when you don’t need the extra database resources. NuoDB also provides developers and administrators with a single intuitive interface for centrally monitoring deployments. If you have read my blog posts and have not tried out NuoDB, I strongly suggest that you download it today and catch up with the learnings with me. Trust me though the product is very powerful, it is extremely easy to learn and use. Reference: Pinal Dave (http://blog.sqlauthority.com)   Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Building Simple Workflows in Oozie

    - by dan.mcclary
    Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

    Read the article

  • slicing up a very big jpg map image , 49000* 34300 pixel

    - by sirvan
    hi i want to write a mapviewer, i must to work small tile of big map image file and there is need to tiling the big image, the problem now is to tiling big image to small tiles (250 * 250 pixel or like this size) so on, i used ImageMagic program to do it but there was problem now is any other programing method or application that do tiling? can i do it with JAI in java? how?

    Read the article

  • Create big buffer on a pic18f with microchip c18 compiler

    - by acemtp
    Using Microchip C18 compiler with a pic18f, I want to create a "big" buffer of 3000 bytes in the program data space. If i put this in the main() (on stack): char tab[127]; I have this error: Error [1300] stack frame too large If I put it in global, I have this error: Error - section '.udata_main.o' can not fit the section. Section '.udata_main.o' length=0x0000007f How to create a big buffer? Do you have tutorial on how to manage big buffer on pic18f with c18?

    Read the article

  • slicing up a very big jpg map image , 140000*125000 pixel

    - by sirvan
    hi i want to write a mapviewer, i must to work small tile of big map image file and there is need to tiling the big image, the problem now is to tiling big image to small tiles (250 * 250 pixel or like this size) so on, i used ImageMagic program to do it but there was problem now is any other programing method or application that do tiling? can i do it with JAI in java? how?

    Read the article

  • Drag big picture in small layer?

    - by Tronic
    Hi, I need a plugin for jquery or another js framework, where I can define a small div where i can drag around a big picture, so i get only a clipping of the picture. any ideas? edit: i try to explain i have a small div, like 600px x 450px. this div behaves like a clipping window for a big picture with like 3000px x 2000px. so i only see a specific cutout of the big picture. and i need to drag that big picture around in this small clipping window! c

    Read the article

  • How to cross-reference many character encodings with ASCII OR UTFx?

    - by Garet Claborn
    I'm working with a binary structure, the goal of which is to index the significance of specific bits for any character encoding so that we may trigger events while doing specific checks against the profile. Each character encoding scheme has an associated system record. This record's leading value will be a C++ unsigned long long binary value and signifies the length, in bits, of encoded characters. Following the length are three values, each is a bit field of that length. offset_mask - defines the occurrence of non-printable characters within the min,max of print_mask range_mask - defines the occurrence of the most popular 50% of printable characters print_mask - defines the occurrence value of printable characters The structure of profiles has changed from the op of this question. Most likely I will try to factorize or compress these values in the long-term instead of starting out with ranges after reading more. I have to write some of the core functionality for these main reasons. It has to fit into a particular event architecture we are using, Better understanding of character encoding. I'm about to need it. Integrating into non-linear design is excluding many libraries without special hooks. I'm unsure if there is a standard, cross-encoding mechanism for communicating such data already. I'm just starting to look into how chardet might do profiling as suggested by @amon. The Unicode BOM would be easily enough (for my current project) if all encodings were Unicode. Of course ideally, one would like to support all encodings, but I'm not asking about implementation - only the general case. How can these profiles be efficiently populated, to produce a set of bitmasks which we can use to match strings with common characters in multiple languages? If you have any editing suggestions please feel free, I am a lightweight when it comes to localization, which is why I'm trying to reach out to the more experienced. Any caveats you may be able to help with will be appreciated.

    Read the article

  • Fast Data: Go Big. Go Fast.

    - by J Swaroop
    Cross-posting Dain Hansen's excellent recap of the Big Data/Fast Data announcement during OOW: For those of you who may have missed it, today’s second full day of Oracle OpenWorld 2012 started with a rumpus. Joe Tucci, from EMC outlined the human face of big data with real examples of how big data is transforming our world. And no not the usual tried-and-true weblog examples, but real stories about taxi cab drivers in Singapore using big data to better optimize their routes as well as folks just trying to get a better hair cut. Next we heard from Thomas Kurian who talked at length about the important platform characteristics of Oracle’s Cloud and more specifically Oracle’s expanded Cloud Services portfolio. Especially interesting to our integration customers are the messaging support for Oracle’s Cloud applications. What this means is that now Oracle’s Cloud applications have a lightweight integration fabric that on-premise applications can communicate to it via REST-APIs using Oracle SOA Suite. It’s an important element to our strategy at Oracle that supports this idea that whether your requirements are for private or public, Oracle has a solution in the Cloud for all of your applications and we give you more deployment choice than any vendor. If this wasn’t enough to get the juices flowing, later that morning we heard from Hasan Rizvi who outlined in his Fusion Middleware session the four most important enterprise imperatives: Social, Mobile, Cloud, and a brand new one: Fast Data. Today, Rizvi made an important step in the definition of this term to explain that he believes it’s a convergence of four essential technology elements: Event Processing for event filtering, business rules – with Oracle Event Processing Data Transformation and Loading - with Oracle Data Integrator Real-time replication and integration – with Oracle GoldenGate Analytics and data discovery – with Oracle Business Intelligence Each of these four elements can be considered (and architect-ed) together on a single integrated platform that can help customers integrate any type of data (structured, semi-structured) leveraging new styles of big data technologies (MapReduce, HDFS, Hive, NoSQL) to process more volume and variety of data at a faster velocity with greater results.  Fast data processing (and especially real-time) has always been our credo at Oracle with each one of these products in Fusion Middleware. For example, Oracle GoldenGate continues to be made even faster with the recent 11g R2 Release of Oracle GoldenGate which gives us some even greater optimization to Oracle Database with Integrated Capture, as well as some new heterogeneity capabilities. With Oracle Data Integrator with Big Data Connectors, we’re seeing much improved performance by running MapReduce transformations natively on Hadoop systems. And with Oracle Event Processing we’re seeing some remarkable performance with customers like NTT Docomo. Check out their upcoming session at Oracle OpenWorld on Wednesday to hear more how this customer is using Event processing and Big Data together. If you missed any of these sessions and keynotes, not to worry. There's on-demand versions available on the Oracle OpenWorld website. You can also checkout our upcoming webcast where we will outline some of these new breakthroughs in Data Integration technologies for Big Data, Cloud, and Real-time in more details.

    Read the article

  • Developing an analytics's system processing large amounts of data - where to start

    - by Ryan
    Imagine you're writing some sort of Web Analytics system - you're recording raw page hits along with some extra things like tagging cookies etc and then producing stats such as Which pages got most traffic over a time period Which referers sent most traffic Goals completed (goal being a view of a particular page) And more advanced things like which referers sent the most number of vistors who later hit a goal. The naieve way of approaching this would be to throw it in a relational database and run queries over it - but that won't scale. You could pre-calculate everything (have a queue of incoming 'hits' and use to update report tables) - but what if you later change a goal - how could you efficiently re-calculate just the data that would be effected. Obviously this has been done before ;) so any tips on where to start, methods & examples, architecture, technologies etc.

    Read the article

  • single for-loop runtime explanation problem

    - by owwyess
    I am analyzing some running times of different for-loops, and as I'm getting more knowledge, I'm curious to understand this problem which I have still yet to find out. I have this exercise called "How many stars are printed": for (int i = N; i > 1; i = i/2) System.out.println("*"); The answers to pick from is A: ~log N B: ~N C: ~N log N D: ~0.5N^2 So the answer should be A and I agree to that, but on the other side.. Let's say N = 500 what would Log N then be? It would be 2.7. So what if we say that N=500 on our exercise above? That would most definitely print more han 2.7 stars? How is that related? Because it makes sense to say that if the for-loop looked like this: for (int i = 0; i < N; i++) it would print N stars. I hope to find an explanation for this here, maybe I'm interpreting all these things wrong and thinking about it in a bad way. Thanks in advance.

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >