100000 - Page 18 - Developer IT

Fast JSON serialization (and comparison with Pickle) for cluster computing in Python?

- by user248237

I have a set of data points, each described by a dictionary. The processing of each data point is independent and I submit each one as a separate job to a cluster. Each data point has a unique name, and my cluster submission wrapper simply calls a script that takes a data point's name and a file describing all the data points. That script then accesses the data point from the file and performs the computation. Since each job has to load the set of all points only to retrieve the point to be run, I wanted to optimize this step by serializing the file describing the set of points into an easily retrievable format. I tried using JSONpickle, using the following method, to serialize a dictionary describing all the data points to file: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f, indent=1) f.close() The dictionary contains very simple objects (lists, strings, floats, etc.) and has a total of 54,000 keys. The json file is ~20 Megabytes in size. It takes ~20 seconds to load this file into memory, which seems very slow to me. I switched to using pickle with the same exact object, and found that it generates a file that's about 7.8 megabytes in size, and can be loaded in ~1-2 seconds. This is a significant improvement, but it still seems like loading of a small object (less than 100,000 entries) should be faster. Aside from that, pickle is not human readable, which was the big advantage of JSON for me. Is there a way to use JSON to get similar or better speed ups? If not, do you have other ideas on structuring this? (Is the right solution to simply "slice" the file describing each event into a separate file and pass that on to the script that runs a data point in a cluster job? It seems like that could lead to a proliferation of files). thanks.

Read the article

Neural Network settings for fast training

- by danpalmer

I am creating a tool for predicting the time and cost of software projects based on past data. The tool uses a neural network to do this and so far, the results are promising, but I think I can do a lot more optimisation just by changing the properties of the network. There don't seem to be any rules or even many best-practices when it comes to these settings so if anyone with experience could help me I would greatly appreciate it. The input data is made up of a series of integers that could go up as high as the user wants to go, but most will be under 100,000 I would have thought. Some will be as low as 1. They are details like number of people on a project and the cost of a project, as well as details about database entities and use cases. There are 10 inputs in total and 2 outputs (the time and cost). I am using Resilient Propagation to train the network. Currently it has: 10 input nodes, 1 hidden layer with 5 nodes and 2 output nodes. I am training to get under a 5% error rate. The algorithm must run on a webserver so I have put in a measure to stop training when it looks like it isn't going anywhere. This is set to 10,000 training iterations. Currently, when I try to train it with some data that is a bit varied, but well within the limits of what we expect users to put into it, it takes a long time to train, hitting the 10,000 iteration limit over and over again. This is the first time I have used a neural network and I don't really know what to expect. If you could give me some hints on what sort of settings I should be using for the network and for the iteration limit I would greatly appreciate it. Thank you!

Read the article

In Perl, is a while loop generally faster than a for loop?

- by Mike

I've done a small experiment as will be shown below and it looks like that a while loop is faster than a for loop in Perl. But since the experiment was rather crude, and the subject might be a lot more complicated than it seems, I'd like to hear what you have to say about this. Thanks as always for any comments/suggestions :) In the following two small scripts, I've tried while and for loops separately to calcaulte the factorial of 100,000. The one that has the while loop took 57 minutes 17 seconds to finish while the for loop equivalent took 1 hour 7 minutes 54 seconds. Script that has while loop: use strict; use warnings; use bigint; my $now = time; my $n =shift; my $s=1; while(1){ $s *=$n; $n--; last if $n==2; } print $s*$n; $now = time - $now; printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60), int($now % 60)); Script that has for loop: use strict; use warnings; use bigint; my $now = time; my $n =shift; my $s=1; for (my $i=2; $i<=$n;$i++) { $s = $s*$i; } print $s; $now = time - $now; printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($now / 3600), int(($now % 3600) / 60), int($now % 60));

Read the article

How to improve performance of non-scalar aggregations on denormalized tables

- by The Lazy DBA

Suppose we have a denormalized table with about 80 columns, and grows at the rate of ~10 million rows (about 5GB) per month. We currently have 3 1/2 years of data (~400M rows, ~200GB). We create a clustered index to best suit retrieving data from the table on the following columns that serve as our primary key... [FileDate] ASC, [Region] ASC, [KeyValue1] ASC, [KeyValue2] ASC ... because when we query the table, we always have the entire primary key. So these queries always result in clustered index seeks and are therefore very fast, and fragmentation is kept to a minimum. However, we do have a situation where we want to get the most recent FileDate for every Region, typically for reports, i.e. SELECT [Region] , MAX([FileDate]) AS [FileDate] FROM HugeTable GROUP BY [Region] The "best" solution I can come up to this is to create a non-clustered index on Region. Although it means an additional insert on the table during loads, the hit isn't minimal (we load 4 times per day, so fewer than 100,000 additional index inserts per load). Since the table is also partitioned by FileDate, results to our query come back quickly enough (200ms or so), and that result set is cached until the next load. However I'm guessing that someone with more data warehousing experience might have a solution that's more optimal, as this, for some reason, doesn't "feel right".

Read the article

PHP Form Sending Information to Limbo!

- by drew

I was told my client's quote form has not been generating very many emails. I have learned that although the form brings you to a confirmation page, the information never reaches the recipient. I have altered the code so it goes to my office email for testing purposes. If I post code for the form elements below, would someone be able to spot what the problem might be? Thank you very much! Link to the quote page is http://autoglass-plus.com/quote.php First is the form itself: <form id="quoteForm" name="form" action="form/index.php" method="post"> <fieldset> <p> <strong>Contact Information:</strong><br /> </p> <div> <label for="firstname">First Name:<br /> </label> <input type="text" size="30" name="firstname" class="txt" id="firstname" /> </div> <div> <label for="lastname">Last Name:<br /> </label> <input type="text" size="30" name="lastname" class="txt" id="lastname" /> </div> <div> <label for="address">Address:<br /> </label> <input type="text" size="30" name="address" class="txt" id="address" /> </div> <div> <label for="city">City:<br /> </label> <input type="text" size="30" name="city" class="txt" id="city" /> </div> <div> <label for="state">State:<br /> </label> <input type="text" size="30" name="state" class="txt" id="state" /> </div> <div> <label for="zip">Zip:<br /> </label> <input type="text" size="30" name="zip" class="txt" id="zip" /> </div> <div> <label for="label">Phone:<br /> </label> <input type="text" size="30" name="phone" class="txt" id="label" /> </div> <div> <label for="email">Email:<br /> </label> <input type="text" size="30" name="email" class="txt" id="email" /> </div> <p><br /> <b>Insurace Information</b></p> <p><i>Auto Glass Plus in an Approved Insurance Vendor. Insurance claims require additional information that we will request when we contact you for your quote.</i></p> <br /> <div> <input type="checkbox" name="insurance" value="yes" /> Check here if this is an insurance claim.<br /> <label for="year">Insurance Provider:<br /> </label> <input type="text" size="30" name="provider" class="txt" id="provider" /> </div> <p><br /> <b>Vehicle Information:</b><br /> </p> <div> <label for="year">Vehicle Year :<br /> </label> <input type="text" size="30" name="year" class="txt" id="year" /> </div> <div> <label for="make">Make: </label> <br /> <input type="text" size="30" name="make" class="txt" id="make" /> </div> <div> <label for="model">Model:</label> <br /> <input type="text" size="30" name="model" class="txt" id="model" /> </div> <div> <label for="body">Body Type:<br /> </label> <select name="body" id="body"> <option>Select One</option> <option value="2 Door Hatchback">2 Door Hatchback</option> <option value="4 Door Hatchback">4 Door Hatchback</option> <option value="2 Door Sedan">2 Door Sedan</option> <option value="4 Door Sedan">4 Door Sedan</option> <option value="Station Wagon">Station Wagon</option> <option value="Van">Van</option> <option value="Sport Utility">Sport Utility</option> <option value="Pickup Truck">Pickup Truck</option> <option value="Other Truck">Other Truck</option> <option value="Recreational Vehicle">Recreational Vehicle</option> <option value="Other">Other</option> </select> </div> <p><b><br /> Glass in Need of Repair:</b><br /> </p> <div> <input type="checkbox" name="repairs" value="Windshield" /> Windshield<br /> <input type="checkbox" name="repairs" value="Back Glass" /> Back Glass<br /> <input type="checkbox" name="repairs" value="Driver’s Side Window" /> Side Window*<br /> <input type="checkbox" name="repairs" value="Chip Repair" /> Chip Repair<br /> <input type="checkbox" name="repairs" value="Other" /> Other </div> <p><strong>*Important:</strong> For side glass, please indicate the specific window that needs replacement <i>(e.g. passenger side rear door or driver side vent glass)</i>, and any tinting color preference in the <strong>Describe Damage </strong> field.</p> <p><br /> <b>Describe Damage</b></p> <div> <textarea rows="6" name="damage" id="damage" cols="37" class="txt"></textarea> </div> <input type="hidden" name="thanks" value="../thanks.php" /> <input type="hidden" name="required_fields" value="firstname, lastname, email, phone" /> <input type="hidden" name="html_template" value="testform.tpl.html" /> <input type="hidden" name="mail_template" value="testmail.tpl.txt" /> <div class="submit"> <center> <input type="submit" value="Submit Form" name="Submit" id="Submit" /> </center> </div> </fieldset> </form> Then it sends to a file named index.php inside the "form" folder: <?php $script_root = './'; $referring_server = 'www.wmsgroup.com, wmsgroup.com, scripts'; $allow_empty_referer = 'yes'; // (yes, no) $language = 'en'; // (see folder 'languages') $ip_banlist = ''; $ip_address_count = '0'; $ip_address_duration = '48'; $show_limit_errors = 'yes'; // (yes, no) $log_messages = 'no'; // (yes, no) $text_wrap = '65'; $show_error_messages = 'yes'; $attachment = 'no'; // (yes, no) $attachment_files = 'jpg, gif,png, zip, txt, pdf, doc, ppt, tif, bmp, mdb, xls, txt'; $attachment_size = 100000; $path['logfile'] = $script_root . 'logfile/logfile.txt'; $path['upload'] = $script_root . 'upload/'; // chmod 777 upload $path['templates'] = $script_root . 'templates/'; $file['default_html'] = 'testform.tpl.html'; $file['default_mail'] = 'testmail.tpl.txt'; /***************************************************** ** Add further words, text, variables and stuff ** that you want to appear in the templates here. ** The values are displayed in the HTML output and ** the e-mail. *****************************************************/ $add_text = array( 'txt_additional' => 'Additional', // {txt_additional} 'txt_more' => 'More' // {txt_more} ); /***************************************************** ** Do not edit below this line - Ende der Einstellungen *****************************************************/ /***************************************************** ** Send safety signal to included files *****************************************************/ define('IN_SCRIPT', 'true'); /***************************************************** ** Load formmail script code *****************************************************/ include($script_root . 'inc/formmail.inc.php') ?> There is also formail.inc.php, testform.tpl.php, testform.tpl.text and then the confirmation page, thanks.php I want to know how these all work together and what the problem could be.

Read the article

Format for storing contacts in a database

- by Gart

I'm thinking of the best way to store personal contacts in a database for a business application. The traditional and straightforward approach would be to create a table with columns for each element, i.e. Name, Telephone Number, Job title, Address, etc... However, there are known industry standards for this kind of data, like for example vCard, or hCard, or vCard-RDF/XML or even Windows Contacts XML Schema. Utilizing an standard format would offer some benefits, like inter-operablilty with other systems. But how can I decide which method to use? The requirements are mainly to store the data. Search and ordering queries are highly unlikely but possible. The volume of the data is 100,000 records at maximum. My database engine supports native XML columns. I have been thinking to use some XML-based format to store the personal contacts. Then it will be possible to utilize XML indexes on this data, if searching and ordering is needed. Is this a good approach? Which contacts format and schema would you recommend for this?

Read the article

Theory: "Lexical Encoding"

- by _ande_turner_

I am using the term "Lexical Encoding" for my lack of a better one. A Word is arguably the fundamental unit of communication as opposed to a Letter. Unicode tries to assign a numeric value to each Letter of all known Alphabets. What is a Letter to one language, is a Glyph to another. Unicode 5.1 assigns more than 100,000 values to these Glyphs currently. Out of the approximately 180,000 Words being used in Modern English, it is said that with a vocabulary of about 2,000 Words, you should be able to converse in general terms. A "Lexical Encoding" would encode each Word not each Letter, and encapsulate them within a Sentence. // An simplified example of a "Lexical Encoding" String sentence = "How are you today?"; int[] sentence = { 93, 22, 14, 330, QUERY }; In this example each Token in the String was encoded as an Integer. The Encoding Scheme here simply assigned an int value based on generalised statistical ranking of word usage, and assigned a constant to the question mark. Ultimately, a Word has both a Spelling & Meaning though. Any "Lexical Encoding" would preserve the meaning and intent of the Sentence as a whole, and not be language specific. An English sentence would be encoded into "...language-neutral atomic elements of meaning ..." which could then be reconstituted into any language with a structured Syntactic Form and Grammatical Structure. What are other examples of "Lexical Encoding" techniques? If you were interested in where the word-usage statistics come from : http://www.wordcount.org

Read the article

Effective Method to Manage and Search Through 100,000+ Objects Instantly? (C#)

- by Kirk

I'm writing a media player for enthusiasts with large collections (over 100,000 tracks) and one of my main goals is speed in search. I would like to allow the user to perform a Google-esque search of their entire music collection based on these factors: Song Path and File Name Items in ID3 Tag (Title, Artist, Album, etc.) Lyrics What is the best way for me to store this data and search through it? Currently I am storing each track in an object and iterating over an array of these objects checking each of their variables for string matches based on given search text. I've run into problems though where my search is not effective because it is always a phrase search and I'm not sure how to make it more fuzzy. Would an internal DB like SQLlite be faster than this? Any ideas on how I should structure this system? I also need playlist persistence, so that when they close the app and open the app their same playlist loads immediately. How should I store the playlist information so it can load quickly when the application starts? Currently I am JSON encoding the entire playlist, storing it in a text file, and reading it into the ListView at runtime, but it is getting sluggish over 20,000 tracks. Thanks!

Read the article

Image processing on bifurcation diagram to get small eps size

- by yCalleecharan

Hello, I'm producing bifurcation diagrams (which are normally used in nonlinear dynamics). These diagrams identify abrupt changes in topologies due to stability changes. These abrupt changes occur as one or more parameters pass through some critical value(s). An example is here: http://en.wikipedia.org/wiki/File:LogisticMap_BifurcationDiagram.png On the above figure, some image processing has been done so as to make the plot more visually pleasant. A bifurcation diagram usually contains hundreds of thousands of points and the resulting eps file can become very big. Journal submission in the LaTeX format require that figures are to be submitted in the eps format. In my case one of such figures can result in about 6 MB in Matlab and even much more in Gnuplot. For the example in the above figure, 100,000 x values are calculated for each r and one can imagine that the resulting eps file would be huge. The site however explains some image processing that makes the plot more visually pleasing. Can anyone explain to me stepwise how go about? I can't understand the explanation provided in the "summary" section. Will the resulting image processing also reduce the figure size? Furthermore, any tips on reducing the file size of such a huge eps figure? Thanks a lot...

Read the article

Event feed implementation - will it scale?

- by SlappyTheFish

Situation: I am currently designing a feed system for a social website whereby each user has a feed of their friends' activities. I have two possible methods how to generate the feeds and I would like to ask which is best in terms of ability to scale. Events from all users are collected in one central database table, event_log. Users are paired as friends in the table friends. The RDBMS we are using is MySQL. Standard method: When a user requests their feed page, the system generates the feed by inner joining event_log with friends. The result is then cached and set to timeout after 5 minutes. Scaling is achieved by varying this timeout. Hypothesised method: A task runs in the background and for each new, unprocessed item in event_log, it creates entries in the database table user_feed pairing that event with all of the users who are friends with the user who initiated the event. One table row pairs one event with one user. The problems with the standard method are well known – what if a lot of people's caches expire at the same time? The solution also does not scale well – the brief is for feeds to update as close to real-time as possible The hypothesised solution in my eyes seems much better; all processing is done offline so no user waits for a page to generate and there are no joins so database tables can be sharded across physical machines. However, if a user has 100,000 friends and creates 20 events in one session, then that results in inserting 2,000,000 rows into the database. Question: The question boils down to two points: Is this worst-case scenario mentioned above problematic, i.e. does table size have an impact on MySQL performance and are there any issues with this mass inserting of data for each event? Is there anything else I have missed?

Read the article

Sudoku Recursion Issue (Java)

- by SkylineAddict

I'm having an issue with creating a random Sudoku grid. I tried modifying a recursive pattern that I used to solve the puzzle. The puzzle itself is a two dimensional integer array. This is what I have (By the way, the method doesn't only randomize the first row. I had an idea to randomize the first row, then just decided to do the whole grid): public boolean randomizeFirstRow(int row, int col){ Random rGen = new Random(); if(row == 9){ return true; } else{ boolean res; for(int ndx = rGen.nextInt() + 1; ndx <= 9;){ //Input values into the boxes sGrid[row][col] = ndx; //Then test to see if the value is valid if(this.isRowValid(row, sGrid) && this.isColumnValid(col, sGrid) && this.isQuadrantValid(row, col, sGrid)){ // grid valid, move to the next cell if(col + 1 < 9){ res = randomizeFirstRow(row, col+1); } else{ res = randomizeFirstRow( row+1, 0); } //If the value inputed is valid, restart loop if(res == true){ return true; } } } } //If no value can be put in, set value to 0 to prevent program counting to 9 setGridValue(row, col, 0); //Return to previous method in stack return false; } This results in an ArrayIndexOutOfBoundsException with a ridiculously high or low number (+- 100,000). I've tried to see how far it goes into the method, and it never goes beyond this line: if(this.isRowValid(row, sGrid) && this.isColumnValid(col, sGrid) && this.isQuadrantValid(row, col, sGrid)) I don't understand how the array index goes so high. Can anyone help me out?

Read the article

GROUP BY ID range?

- by d0ugal

Given a data set like this; +-----+---------------------+--------+ | id | date | result | +-----+---------------------+--------+ | 121 | 2009-07-11 13:23:24 | -1 | | 122 | 2009-07-11 13:23:24 | -1 | | 123 | 2009-07-11 13:23:24 | -1 | | 124 | 2009-07-11 13:23:24 | -1 | | 125 | 2009-07-11 13:23:24 | -1 | | 126 | 2009-07-11 13:23:24 | -1 | | 127 | 2009-07-11 13:23:24 | -1 | | 128 | 2009-07-11 13:23:24 | -1 | | 129 | 2009-07-11 13:23:24 | -1 | | 130 | 2009-07-11 13:23:24 | -1 | | 131 | 2009-07-11 13:23:24 | -1 | | 132 | 2009-07-11 13:23:24 | -1 | | 133 | 2009-07-11 13:23:24 | -1 | | 134 | 2009-07-11 13:23:24 | -1 | | 135 | 2009-07-11 13:23:24 | -1 | | 136 | 2009-07-11 13:23:24 | -1 | | 137 | 2009-07-11 13:23:24 | -1 | | 138 | 2009-07-11 13:23:24 | 1 | | 139 | 2009-07-11 13:23:24 | 0 | | 140 | 2009-07-11 13:23:24 | -1 | +-----+---------------------+--------+ How would I go about grouping the results by day 5 records at a time. The above results is part of the live data, there is over 100,000 results rows in the table and its growing. Basically I want to measure the change over time, so want to take a SUM of the result every X records. In the real data I'll be doing it ever 100 or 1000 but for the data above perhaps every 5. If i could sort it by date I would do something like this; SELECT DATE_FORMAT(date, '%h%i') ym, COUNT(result) 'Total Games', SUM(result) as 'Score' FROM nn_log GROUP BY ym; I can't figure out a way of doing something similar with numbers. The order is sorted by the date but I hope to split the data up every x results. It's safe to assume there are no blank rows. Doing it above with the data you could do multiple selects like; SELECT SUM(result) FROM table LIMIT 0,5; SELECT SUM(result) FROM table LIMIT 5,5; SELECT SUM(result) FROM table LIMIT 10,5; Thats obviously not a very good way to scale up to a bigger problem. I could just write a loop but I'd like to reduce the number of queries.

Read the article

select all values from a dimension for which there are facts in all other dimensions

- by ideasculptor

I've tried to simplify for the purposes of asking this question. Hopefully, this will be comprehensible. Basically, I have a fact table with a time dimension, another dimension, and a hierarchical dimension. For the purposes of the question, let's assume the hierarchical dimension is zip code and state. The other dimension is just descriptive. Let's call it 'customer' Let's assume there are 50 customers. I need to find the set of states for which there is at least one zip code in which EVERY customer has at least one fact row for each day in the time dimension. If a zip code has only 49 customers, I don't care about it. If even one of the 50 customers doesn't have a value for even 1 day in a zip code, I don't care about it. Finally, I also need to know which zip codes qualified the state for selection. Note, there is no requirement that every zip code have a full data set - only that at least one zip code does. I don't mind making multiple queries and doing some processing on the client side. This is a dataset that only needs to be generated once per day and can be cached. I don't even see a particularly clean way to do it with multiple queries short of simply brute-force iteration, and there are a heck of a lot of 'zip codes' in the data set (not actually zip codes, but the there are approximately 100,000 entries in the lower level of the hierarchy and several hundred in the top level, so zipcode-state is a reasonable analogy)

Read the article

Database for managing large volumes of (system) metrics

- by symcbean

Hi, I'm looking at building a system for managing and reporting stats on web page performance. I'll be collecting a lot more stats than are available in the standard log formats (approx 20 metrics) but compared to most types of database applications, the base data structure will be very simple. My problem is that I'll be accumulating a lot of data - in the region of 100,000 records (i.e. sets of metrics) per hour. Of course, resources are very limited! So that its possible to sensibly interact with the data, I'd need to consolidate each metric into one minute bins, broken down by URL, then for anything more than 1 day old, consolidated into 10 minute bins, then at 1 week, hourly bins. At the front end, I want to provide a view (prefereably as plots) of the last hour of data, with the facility for users to drill up/down through defined hierarchies of URLs (which do not always map directly to the hierarchy expressed in the path of the URL) and to view different time frames. Rather than coding all this myself and using a relational database, I was wondering if there were tools available which would facilitate both the management of the data and the reporting. I had a look at Mondrian however I can't see from the documentation I've looked at whether it's possible to drop the more granular information while maintaining the consolidated views of the data. RRDTool looks promising in terms of managing the data consolidation, but seems to be rather limited in terms of querying the dataset as a multi-dimensional/relational database. What else whould I be looking at?

Read the article

different thread accessing MemoryStream

- by Wayne

There's a bit of code which writes data to a MemoryStream object directly into it's data buffer by calling GetBuffer(). It also uses and updates the Position and SetLength() properties appropriately. This code works purposes 99.9999% of the time. Literally. Only every so many 100,000's of iterations it will barf. The specific problem is that the memory.Position property suddenly returns zero instead of the appropriate value. However, code was added that checks for the 0 and throws an exception which include log of the MemoryStream properties like Position and Length in a separate method. Those return the correct value. Further addition shows that when this rare condition occurs, the memory.Position only has zero inside this particular method. Okay. Obviously, this must be a threading issue. But this code is well locked. However, the nature of this software is that it's organized by "tasks" with a scheduler and so any one of several actual O/S thread may run this code at any give time--but never more than one at a time. So it's my guess that ordinarily it so happens that the same thread keeps getting used for this method and then on a rare occasion a different thread get used. Then due to compiler optimizations, the different thread never gets the correct value. It gets a "stale" value. Ordinarily in a situation like this I would apply a "volatile" keyword to the variable in question. But that (those) variables are inside the MemoryStream object. Does anyone have any other idea? Or does this mean we have to implement our own MemoryStream object? (Just like we end up having to do with practically every collection in .NET?) It's a shame to have such an awesome platform as .NET and have virtually the entire system useless as-is for seriously parallelized applications. If I'm wrong or you have other ideas, please advise. Sincerely, Wayne

Read the article

How to avoid geometric slowdown with large Linq transactions?

- by Shaul

I've written some really nice, funky libraries for use in LinqToSql. (Some day when I have time to think about it I might make it open source... :) ) Anyway, I'm not sure if this is related to my libraries or not, but I've discovered that when I have a large number of changed objects in one transaction, and then call DataContext.GetChangeSet(), things start getting reaalllly slooowwwww. When I break into the code, I find that my program is spinning its wheels doing an awful lot of Equals() comparisons between the objects in the change set. I can't guarantee this is true, but I suspect that if there are n objects in the change set, then the call to GetChangeSet() is causing every object to be compared to every other object for equivalence, i.e. at best (n^2-n)/2 calls to Equals()... Yes, of course I could commit each object separately, but that kinda defeats the purpose of transactions. And in the program I'm writing, I could have a batch job containing 100,000 separate items, that all need to be committed together. Around 5 billion comparisons there. So the question is: (1) is my assessment of the situation correct? Do you get this behavior in pure, textbook LinqToSql, or is this something my libraries are doing? And (2) is there a standard/reasonable workaround so that I can create my batch without making the program geometrically slower with every extra object in the change set?

Read the article

hash fragments and collisions cont.

- by Mark

For this application I've mine I feel like I can get away with a 40 bit hash key, which seems awfully low, but see if you can confirm my reasoning (I want a small key because I want a small filename and the key will be converted to a filename): (Note: only accidental collisions a concern - no security issues.) A key point here is that the population in question is divided into groups, and a collision is only relevant if it occurs within the same group. A "group" is a directory on a user's system (the contents of files are hashed and a collision is only relevant if it occurs for files within the same directory). So with speculating roughly 100,000 potential users, say 2^17, that corresponds to 2^18 "groups" assuming 2 directories per user on average. So with a 40 bit key I can expect 2^(20+9) files created (among all users) before a collision occurs for some user somewhere. (Or IOW 2^((40+18)/2), due to the "birthday effect".) That's an average 4096 unique files created per user, for 2^17 users, before a single collision occurs for some user somewhere. And then that long again before another collision occurs somewhere (right?)

Read the article

Excel: Automating the Selection of an Unknown Number of Cells

- by user1905080

I’m trying to automate the formatting of an excel file by a macro and am seeking a solution. I have two columns titled Last Name and First Name which I would like to concatenate into a separate column titled Last Name, First Name. This is simple enough when done by hand: create one cell which does this, then drag that cell to include all cells within the range. The problem appears when trying to automate this. Because I can’t know the number of names that need to be concatenated ahead of time, I can’t automate the selection of cells by dragging. Can you help me automate this? I’ve tried a process of copying the initial concatenated cell, highlighting the column, and then pasting. I’ve also tried to use a formula which returned the concatenation only if there is text in the “Last Name” and “First Name” columns. However, in both cases, I end up with some 100,000 rows, putting a serious cramp on my ability to manipulate the worksheet. The best solution I can think of is to create concatenations within a fixed range of cells. Although this would create useless cells, at least there wouldn’t be 99,900 of them.

Read the article

Last GUID used up - new ScottGuID unique ID to replace it

- by Eilon

You might have heard in recent news that the last ever GUID was used up. The GUID {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF} was just consumed by a soon to be released project at Microsoft. Immediately after the GUID's creation the word spread around the Microsoft campuses around the globe. Microsoft's approximately 100,000 worldwide employees then started blogging, tweeting, and facebooking about the dubious "achievement." The following screenshot shows GUIDGEN (the Windows tool for creating GUIDs) with the last ever GUID. All GUIDs created by projects at Microsoft must be registered in a central repository for record keeping. This allows quick-fix engineers, security engineers, anti-malware developers, and testers to do a quick look up of an unknown GUID and find out if it belongs to Microsoft. The following screenshot shows the Microsoft GUID Tracker internal application and the last few GUIDs being used up by various Microsoft projects. What is perhaps more interesting than the news about the GUID is the project that used that last GUID. The recent announcements regarding the development experience for the Windows Phone 7 Series (WP7S) all involve free editions of Visual Studio 2010. One of the lesser known developer tools is based on a resurrected project that many of you are probably familiar with, but have never used. The tool is in fact Microsoft Bob 7 Series (MB7S). MB7S is an agent-based approach for mobile phone app development. The UI incorporates both natural language interfaces and motion gesture behaviors, similar to the Windows Phone 7 Series “Metro” interface. If it works, it will help to expand the breadth of mobile app developers. After the GUID: The ScottGuID It came as no big surprise that eventually the last GUID would be used up. Knowing this, a group of engineers at Microsoft has designed, implemented, and tested a replacement to the GUID: The ScottGuID. There are several core principles of the ScottGuID: 1. The concepts used in ScottGuIDs must be easily understood by a developer who is already familiar with GUIDs 2. There must exist a compatibility layer between ScottGuIDs and GUIDs 3. A ScottGuID must be usable in a practical manner in non-computing environments 4. There must exist ScottGuID APIs for all common platforms: Win32/Win64/WinCE, .NET (incl. Silverlight), Linux, FreeBSD, MacOS (incl. iPhone OS), Symbian, RIM BlackBerry, Google Android, etc. 5. ScottGuIDs must never run out ScottGuID use cases One of the more subtle principles of the ScottGuID is principle #3. While technically a GUID could be used in any environment, it was not practical to do so in terms of data entry and error detection. In order to have the ScottGuID be a true universal ID it must be usable in non-computing environments. Prior to the announcement of the ScottGuID there have been a number of until-now confidential projects. One of the tools that will soon become public is ScottGuIDGen, which is in essence an updated version of GUIDGEN that can create ScottGuIDs. The following screenshot shows a sample ScottGuID. To demonstrate the various applications of the ScottGuID there were test deployments around the globe. The following examples are a small showcase of the applications that have already been prototyped. Log in to Hotmail: Pay for gas: Sign in to Twitter: Dispense cat food: Conclusion I hope that this brief introduction to the ScottGuID shows how technology can continue to move forward, even when it appears there is a point that cannot be passed. With a small number of principles, a team of smart engineers, and a passion for "getting it right" the ScottGuID should last well past our lifetimes. In the coming months expect further announcements regarding additional developer tools, samples, whitepapers, podcasts, and videos. Please leave a comment on this post if you have any questions about the ScottGuID or what you would like to see us do with it. With ScottGuID, the possibilities are nearly endless and we want to stretch their reach as far as possible.

Read the article

Listen to Online Radio with Antenna

- by Asian Angel

Are you looking for some fresh new music to listen to at home or at work? With Antenna you can listen to online radio stations from all over the world. Note: Requires Adobe AIR (download link at bottom of article). Antenna in Action Once you have completed the installation and started Antenna up this is the window that you will see. The left side will have a “browsing pane” where you can search for the stations that you would like to listen to using the various categories. Based on the stations that you choose the background map will change location to match the stations locations. Here is a closer look at the “Categories Bar”. For our first example we used the “Country Category” to find our first station to listen to. When you choose a country you will be presented with a list of the stations available for that country. To start listening to a particular station just double click on the appropriate entry line. A closer look at the “browser pane” with our first station playing. Notice the “Reliability Indicator” that will be available for each listing…some may be better than others and you can use this to choose the best streaming stations from the list. In the upper left corner you will notice three icons…each will open a small pop-up window with a specific purpose. The first icon will open up the “About Window”. If you need to contact Antenna’s creator or would like to place a request for a station to be added to the app then this is the best way to do it. The second icon will open up a Antenna specific chat window. The third icon will allow you to set a default location and make adjustments to some of the app’s settings. Recording Audio The “Recording Function” is the only area where we experienced some “quirkiness” with the app. To start recording press the “Round White Button”… Note: Based on feedback on the app creator’s webpage some people have experienced the same problem as we did during our tests with the app failing to complete the recordings. Hopefully this bug will be fixed with the next release. Once recording has started the button will turn red. Click on the button again to stop recording. Once you have stopped recording you will see the following message window appear and the main window will be shaded over with a whitish color until you click “OK”. Conclusion Regardless of the slight quirkiness in recording online music Antenna more than makes up for it with the terrific selection of online stations and streaming capability. New fresh music for you to listen to is only a click or two away… Links Download Antenna (Antenna Homepage) Download Antenna at Softpedia Download Adobe AIR Similar Articles Productive Geek Tips Listen to Local FM Radio in Windows 7 Media CenterListen to Over 100,000 Radio Stations in Windows Media CenterListen To XM Radio with Windows Media Center in Windows 7Listen and Record Over 12,000 Online Radio Stations with RadioSureWeekend Fun: Watch Television on Your PC with AnyTV TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips DVDFab 6 Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 Will it Blend? iPad Edition Penolo Lets You Share Sketches On Twitter Visit Woolyss.com for Old School Games, Music and Videos Add a Custom Title in IE using Spybot or Spyware Blaster When You Need to Hail a Taxi in NYC Live Map of Marine Traffic

Read the article

Content Challenge: You Can Only Get it Here

- by Mike Stiles

Part of the content conundrum for brands is figuring out what kind of content customers would find cool, desirable, and relevant. The mere fact many brands have no idea what this content might be is, in itself, pretty alarming. You’d have to have a pretty thorough lack of involvement with and understanding of your customers to not know what they might like. But despite what should be a great awakening in which consumers are using every technology and trick in the book to shield themselves from ads and commercials, brand self-obsession continues as marketers concentrate on their message, their campaign, what they want to say, and what they want social users to do. When individuals conduct themselves in that same fashion on Facebook and Twitter, it gets tiresome and starts losing value pretty quickly. Their posts eventually get hidden. Conversely, friends who post things that consistently entertain or inform, with little self-marketing desperation involved, win the coveted “show all updates” setting. Of course brands are going to use social to market. It’s pretty much the point of having social in the marketing mix. And yes, people who follow a brand’s Twitter account or “Like” a brand’s Facebook Page implicitly state they want to know what’s going on with that brand’s products and services. But if you have a Facebook friend that assumes you want every one of her posts to be about what wine she likes (Mitsubishi’s current campaign is even based around weeding out pretentious Facebook friends, then running them over), then you know how it must feel for your fans and followers to get a sales pitch for your crackers or whatever you’re selling every single time. Is there such a thing as content that doesn’t sell but that still advances the brand and makes the consumer more involved and valuable? Of course. And perhaps there are no better companies than enterprise brands to do it. Enterprise organizations are large enough to go beyond a product and engage readers/viewers at higher, broader levels…communicating expertise across entire sectors, subjects and industries. You’re going from pitchman to news source, and getting full credit for it as the presenter. A recent GigaOM article pointed out the success a San Francisco-based startup called Crunchyroll is having. Their niche (and they proudly admit it’s a niche) is providing Japanese anime, Korean drama and Asian live action content to countries that can’t get it any other way via licensing deals. Shows are available in HD and on the same day they air in the host country. Crunchyroll not only gets 8 million viewers a month, they have 100,000 paying subscribers at $7-12/month. Got a point, Mike? I do happen to have one. Crunchyroll illustrates the content opportunity enterprise companies have…which is to determine your “area,” the interest graph of your customers, then provide content that speaks to and satisfies those interests that can’t be found anywhere else. At least not in the same style, or of the same quality, or with the same authority. Do what no one else is doing. Provide what no one else is providing in your sector. If underserved users are willing to pay monthly for access to awkwardly moving cartoon dragons, imagine the audience you could attract with free, useful, non-sales content in your customers’ area of interest. It’s an audience you’ll want in place when the time does come to put out that marketing message. A content challenge is better than a content conundrum any day.

Read the article

Moving monarchs and dragons: migrating the JDK bugs to JIRA

- by darcy

Among insects, monarch butterflies and dragonflies have the longest migrations; migrating JDK bugs involves a long journey as well! As previously announced by Mark back in March, we've been working according to a revised plan to transition the JDK bug management from Sun's legacy system to initially an Oracle-internal JIRA instance which is afterward made visible and usable externally. I've been busily working on this project for the last few months and the team has made good progress on many aspects of the effort: JDK bugs will be imported into JIRA regardless of age; bugs will also be imported regardless of state, including closed bugs. Consequently, the JDK bug project will start pre-populated with over 100,000 existing bugs, some dating all the way back to 1994. This will allow a continuity of information and allow new issues to be linked to old ones. Using a custom import process, the Sun bug numbers will be preserved in JIRA. For example, the Sun bug with bug number 4040458 will become "JDK-4040458" in JIRA. In JIRA the project name, "JDK" in our case, is part of the bug's identifier. Bugs created after the JIRA migration will be numbered starting at 8000000; bugs imported from the legacy system have numbers ranging between 1000000 and 79999999. We're working with the bugs.sun.com team to try to maintain continuity of the ability to both read JDK bug information as well as to file new incidents. At least for now, the overall architecture of bugs.sun.com will be the same as it is today: it will be a gateway bridging to an Oracle-internal system, but the internal system will change to JIRA from the legacy database. Generally we are aiming to preserve the visibility of bugs currently viewable on bugs.sun.com; however, bugs in areas not related to the JDK will not be visible after the transition to JIRA. New incoming incidents will be sent to a separate JIRA project for initial triage before possibly being moved into the JDK project. JDK bug management leans heavily on being able to track the state of bugs in multiple releases, especially to coordinate delivering synchronized security releases (known as CPUs, critital patch updates, in Oracle parlance). For a security release, it is common for half a dozen or more release trains to be affected (for example, JDK 5, JDK 6 update, OpenJDK 6, JDK 7 update, JDK 8, virtual releases for HotSpot express, etc.). We've determined we need to track at least the tuple of (release, responsible engineer/assignee for the release, status in the release) for the release trains a fix is going into. To do this in JIRA, we are creating a separate port/backport issue type along with a custom link type to allow the multiple release information to be easily grouped and presented together. The Sun legacy system had a three-level classification scheme, product, category, and subcategory. Out of the box, JIRA only has a one-level classification, component. We've implemented a custom second-level classification, subcomponent. As part of the bug migration we've taken the opportunity to think about how bugs should be grouped under a two-level system and we'll the new system will be simpler and more regular. The main top-level components of the JDK product will include: core-libs client-libs deploy install security-libs other-libs tools hotspot For the libs areas, the primary name of the subcomportment will be the package of the API in question. In the core-libs component, there will be subcomponents like: java.lang java.lang.class_loading java.math java.util java.util:i18n In the tools component, subcomponents will primarily correspond to command names in $JDK/bin like, jar, javac, and javap. The first several bulk imports of the JDK bugs into JIRA have gone well and we're continuing to refine the import to have greater fidelity to the current data, including by reconstructing information not brought over in a structured fashion during the previous large JDK bug system migration back in 2004. We don't currently have a firm timeline of when the new system will be usable externally, but as it becomes available, I'll share further information in follow-up blog posts.

Read the article

Nominations now open for the Oracle FMW Excellence Awards 2014

- by Greg Jensen

2014 Oracle Excellence Award NominationsWho Is the Innovative Leader for Identity Management? • Is your organization leveraging one of Oracle’s Identity and Access Management solutions in your production environment?• Are you a leading edge organization that has adopted a forward thinking approach to Identity and Access Management processes across the organization?• Are you ready to promote and highlight the success of your deployment to your peers? • Would you a chance to win FREE registration to Oracle OpenWorld 2014? Oracle is pleased to announce the call for nominations for the 2014 Oracle Excellence Awards: Oracle Fusion Middleware Innovation. The Oracle Excellence Awards for Oracle Fusion Middleware Innovation honor organizations using Oracle Fusion Middleware to deliver unique business value. This year, the awards will recognize customers across nine distinct categories, including Identity and Access Management. Oracle customers, who feel they are pioneers in their implementation of at least one of the Oracle Identity and Access Management offerings in a production environment or active deployment, should submit a nomination. If submitted by June 20th, 2014, you will have a chance to win a FREE registration to Oracle OpenWorld 2014 (September 28 - October 2) in San Francisco, CA. Top customers will be showcased at Oracle OpenWorld and featured in Oracle publications. The Identity and Access Management Nomination Form Additional benefits to nomineesNominating your organization opens additional opportunities to partner with Oracle such as:• Promotion of your Customer Success StoriesProvides a platform for you to share the success of your initiatives and programs to peer groups raising the overall visibility of your team and your organization as a leader in security• Social Media promotion (Video, Blog & Podcast)Reach the masses of Oracle’s customers through sharing of success stories, or customer created blog content that highlights the advanced thought leadership role in security with co-authored articles on Oracle Blog page that reaches close to 100,000 subscribers. There are numerous options to promote activities on Facebook, Twitter and co-branded activities using Video and Audio. • Live speaking opportunities to your peersAs a technology leader within your organization, you can represent your organization at Oracle sponsored events (online, in person or webcasts) to help share the success of your organizations efforts building out your team/organization brand and success. • Invitation to the IDM Architect ForumOracle is able to invite the right customers into the IDM Architect Forum which is an invite only group of customers that meet monthly to hear technology driven presentations from their own peers (not from Oracle) on today’s trends. If you want to hear privately what some of the most successful companies in every industry are doing about security, this is the forum to be in. All presentations are private and remain within the forum, and only members can see take advantage of the lessons gained from these meetings. To date, there are 125 members. There are many more advantages to partnering with Oracle, however, it can start with the simple nomination form for Identity and Access Management category of the 2014 Oracle Excellence Award Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article

More Free Apps Bound for the Marketplace

- by Scott Kuhl

Microsoft has announced they are raising the limit of free applications a developer can submit from 5 to 100. But what does that really mean? First, lets look at the reason for the limitation. The iTunes Store and the Android Market both have a lot more applications available than the Windows Phone Marketplace. But that says nothing about the quality of those applications. I attended a couple of pre-launch events and Microsoft representatives were clearly told to send a message. We don’t want a bunch of junky applications that do nothing but spam the marketplace. That was the reason for the 5 free application limit. Okay, so now what has the result been? Well, there are still fart apps, but there is no sign of a developer flooding the marking with 1500 wallpaper applications or 1000 of the same application all pointed at different RSS feeds. On the other hand there are developers who want to release real free apps but are constrained by the 5 app limit. So why did Microsoft change it’s mind? Is it to get the count of applications up, or is to make developers happy? Windows Phone Marketplace is growing fast but it’s a long way behind the other guys. I don’t think Microsoft wants to have 100,000 apps show up in the next 3 months if they are loaded with copy cat apps. Those numbers will get picked apart quickly and the press will start complaining about the same problems the Android Market has. I do think the bump was at developer request. Microsoft is usually good about listening to developer feedback, but has been pretty slow about it at times. And from a financial perspective, there will me more apps that Microsoft has to review that they will see no profit on. At least not until they bake in a advertising model connected to Bing. Ultimately, what does this mean for the future? Well, there are developers out there looking to release more than 5 simple free apps, so I think we will see more hobby apps. And there are developers out there trying to make money from advertising instead of sales, so I think we will see more of those also. But the category that I think will grow the fastest is free versions of paid applications that are the same as the trial version of the application. While technically that makes no sense, its purely a marketing move. Free apps get downloaded a lot more than paid apps, even with a trial mode. It always surprises me how little consumers are willing to spend on mobile apps. How many reviews of applications have you seen that says something like “a bit pricey at $1.99”. Really? Have you looked at how much you spend on your phone and plan? I always thought the trial mode baked into Windows Marketplace was a good idea. So I’m not sure how the more open free market will play out. In the long run though, I won’t be surprised to see a Bing ad mobile ad model show up so Microsoft can capitalize on the more open and free Windows Marketplace. Bonus: The Oatmeal on How I Feel About Buying Apps

Read the article

Let Me Show You Something: Instagram, Vine and Snapchat for Brands

- by Mike Stiles

While brands are well aware of how much more impactful images are than text-only posts on social channels, today you’re additionally being presented with platform after additional platform for hosting, doctoring and sharing photos and videos. Can you play in every sandbox? And if you do, can you be brilliant on all of them? As has usually been the case, so far brands are sticking their toes into new platforms while not actually committing to them, or strategizing for them, or resourcing them. TrackMaven found of the 123 F500 companies using Instagram, only 22% of them are active on it. Likewise, research from Simply Measured found brands are indeed jumping in, with the number establishing a presence on Instagram up 55% over the past year. Users want them there…brand engagement has exploded 350%, and over 1/3 of the top brands have at least 10,000 followers. BUT…the top 10 brands are generating 33% of all posts, reaping 83% of all engagement. Things are also growing on Twitter’s Vine, the 6-second looping video app that hit 40 million users in August. The 7th Chamber says 5 tweets a second contain a Vine link. Other studies say branded Vines are 4 times more likely to be shared and seen than rank-and-file branded videos. Why? Users know that even if a video is pure junk, they won’t get robbed of too much of their valuable time. Vine is always upgrading so you can make sure your videos are worth viewers’ time. You can now edit videos, and save & work on several projects concurrently. What you can’t do is upload a finely crafted video into Vine, but you can do that with Instagram. The key to success? Same as with all other content; make it of value. Deliver a laugh or a lesson or both. How-to, behind the scenes peeks, contests, demos, all make sense in the short video format. Or follow Nash Grier’s example, which is to just have fun with and connect to your viewers, earning their trust that your next Vine will be as good as the last. Nash is only 15, has over 1.4 million followers, and adds about 100,000 a week. He broke out when one of his videos was re-Vined by some other kid with 300,000 followers. Make good stuff, get it in front of influencers, and your brand Vines could break out as well. Then there’s Snapchat, the “this photo will self destruct” platform. How can that be of use to brands besides offering coupons that really expire? The jury is out. But with an audience of over 100 million and a valuation of $800 million, media-with-a-time-limit is compelling. Now there’s “Snapchat Stories” that can last 24 hours and be shared to the public at large. You might be able to capitalize on how much more focus gets put on content when there’s a time limit on its availability. The underlying truth to all of this is, these are all tools. Very cool, feature rich tools, but tools. You can give the exact same art kit to 5 different people and you’d get back 5 very different works, ranging from worthless garbage to masterpiece. Brands are being called upon to be still and moving image artists. That’s what your customers are used to seeing, from a variety of sources. Commit to communicating with them accordingly. @mikestiles Photo: stock.xchng

Search Results

Search found 487 results on 20 pages for '100000'.

Page 18/20 | < Previous Page | 14 15 16 17 18 19 20 | Next Page >

- by user248237

- by danpalmer

- by Mike

- by The Lazy DBA

- by drew

- by Gart

- by _ande_turner_

- by Kirk

- by yCalleecharan

- by SlappyTheFish

- by SkylineAddict

- by d0ugal

- by ideasculptor

- by symcbean

- by Wayne

- by Shaul

- by Mark

- by user1905080

- by Eilon

- by Asian Angel

- by Mike Stiles

- by darcy

- by Greg Jensen

- by Scott Kuhl

- by Mike Stiles

< Previous Page | 14 15 16 17 18 19 20 | Next Page >