average joe - Page 122 - Developer IT

JMS message. Model to include data or pointers to data?

- by John

I am trying to resolve a design difference of opinion where neither of us has experience with JMS. We want to use JMS to communicate between a j2ee application and the stand-alone application when a new event occurs. We would be using a single point-to-point queue. Both sides are Java-based. The question is whether to send the event data itself in the JMS message body or to send a pointer to the data so that the stand-alone program can retrieve it. Details below. I have a j2ee application that supports data entry of new and updated persons and related events. The person records and associated events are written to an Oracle database. There are also stand-alone, separate programs that contribute new person and event records to the database. When a new event occurs through any of 5-10 different application functions, I need to notify remote systems through an outbound interface using an industry-specific standard messaging protocol. The outbound interface has been designed as a stand-alone application to support scalability through asynchronous operation and by moving it to a separate server. The j2ee application currently has most of the data in memory at the time the event is entered. The data would consist of approximately 6 different objects; a person object and some with multiple instances for an average size in the range of 3000 to 20,000 bytes. Some special cases could be many times this amount. From a performance and reliability perspective, should I model the JMS message to pass all the data needed to create the interface message, or model the JMS message to contain record keys for the data and have the stand-alone Java application retrieve the data to create the interface message?

Read the article

Python : How do you find the CPU consumption for a piece of code?

- by Yugal Jindle

Background: I have a django application, it works and responds pretty well on low load, but on high load like 100 users/sec, it consumes 100% CPU and then due to lack of CPU slows down. Problem : Profiling the application gives me time taken by functions. This time increases on high load. Time consumed may be due to complex calculation or for waiting for CPU. so, how to find the CPU cycles consumed by a piece of code ? Since, reducing the CPU consumption will increase the response time. I might have written extremely efficient code and need to add more CPU power OR I might have some stupid code taking the CPU and causing the slow down ? Any help is appreciated ! Update: I am using Jmeter to profile my webapp, it gives me a throughput of 2 requests/sec. [ 100 users] I get a average time of 36 seconds on 100 request vs 1.25 sec time on 1 request. More Info Configuration Nginx + Uwsgi with 4 workers No database used, using a responses from a REST API On 1st hit the response of REST API gets cached, therefore doesn't makes a difference. Using ujson for json parsing. Curious to Know: Python-Django is used by so many orgs for so many big sites, then there must be some high end Debug / Memory-CPU analysis tools. All those I found were casual snippets of code that perform profiling.

Read the article

Know the row with max characters (C)

- by l_core

I have wrote a program in C, to find the row with the max number of characters. Here is the code #include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <string.h> int main (int argc, char *argv[]) { char c; /* used to store the character with getc */ int c_tot = 0, c_rig = 0, c_max = 0; /* counters of characters*/ int r_tot = 0; /* counters of rows */ FILE *fptr; fptr = fopen(argv[1], "r"); if (fptr == NULL || argc != 2) { printf ("Error opening the file %s\n'", argv[1]); exit(EXIT_FAILURE); } while ( (c = getc(fptr)) != EOF) { if (c != ' ' && c != '\n') { c_tot++; c_rig++; } if (c == '\n') { r_tot++; if (c_rig > c_max) c_max = c_rig; c_rig = 0; } } printf ("Total rows: %d\n", r_tot); printf ("Total characters: %d\n", c_tot); printf ("Total characters in a row: %d\n", c_max); printf ("Average number of characters on a row: %d\n", (c_tot/r_tot)); printf ("The row with max characters is: %s\n", ??????) return 0; } I can easily find the row with the highest number of characters but how can i print that out? Thank You Folks

Read the article

Is there a Java data structure that is effectively an ArrayList with double indicies and built-in in

- by Bob Cross

I am looking for a pre-built Java data structure with the following characteristics: It should look something like an ArrayList but should allow indexing via double-precision rather than integers. Note that this means that it's likely that you'll see indicies that don't line up with the original data points (i.e., asking for the value that corresponds to key "1.5"). As a consequence, the value returned will likely be interpolated. For example, if the key is 1.5, the value returned could be the average of the value at key 1.0 and the value at key 2.0. The keys will be sorted but the values are not ensured to be monotonically increasing. In fact, there's no assurance that the first derivative of the values will be continuous (making it a poor fit for certain types of splines). Freely available code only, please. For clarity, I know how to write such a thing. In fact, we already have an implementation of this and some related data structures in legacy code that I want to replace due to some performance and coding issues. What I'm trying to avoid is spending a lot of time rolling my own solution when there might already be such a thing in the JDK, Apache Commons or another standard library. Frankly, that's exactly the approach that got this legacy code into the situation that it's in right now.... Is there such a thing out there in a freely available library?

Read the article

MATLAB, time match filter

- by Paul

OK, I am still getting the hang of MATLAB. I have two files in different format. One Excel file. data1.xls, size= 86400 X 62. It looks like: Date/Time par1 par2 par3 par4 par5 par6 par6 par7 par8 par9 08/02/09 00:06:45 0 3 27 9.9 -133.2 0 0 0 1 0 Another file, data2.csv, size = 144 X 27. (If nothing is missing.) It looks like: date time P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 8/16/2009 0:00 51 45 46 54 53 52 524 5 399 89 78 Now I am using Data10minAvg = mean(reshape(Data,300,144,62)); to get the 10 min average of the first Excel file. Now I need to match up that file I am making above with the .csv file. The problem is many timestamps are missing in the .csv file. How do I make data2.csv into a file of size 144 X 27, replacing the missing datestamps by rows of zero? It will really help me than compare data1.xls file with newdata2.csv.

Read the article

Dealing with large number of text strings

- by Fadrian

My project when it is running, will collect a large number of string text block (about 20K and largest I have seen is about 200K of them) in short span of time and store them in a relational database. Each of the string text is relatively small and the average would be about 15 short lines (about 300 characters). The current implementation is in C# (VS2008), .NET 3.5 and backend DBMS is Ms. SQL Server 2005 Performance and storage are both important concern of the project, but the priority will be performance first, then storage. I am looking for answers to these: Should I compress the text before storing them in DB? or let SQL Server worry about compacting the storage? Do you know what will be the best compression algorithm/library to use for this context that gives me the best performance? Currently I just use the standard GZip in .NET framework Do you know any best practices to deal with this? I welcome outside the box suggestions as long as it is implementable in .NET framework? (it is a big project and this requirements is only a small part of it) EDITED: I will keep adding to this to clarify points raised I don't need text indexing or searching on these text. I just need to be able to retrieve them in later stage for display as a text block using its primary key. I have a working solution implemented as above and SQL Server has no issue at all handling it. This program will run quite often and need to work with large data context so you can imagine the size will grow very rapidly hence every optimization I can do will help.

Read the article

APC decreasing php performance??? (php 5.3, apache 2.2, windows vista 64bit)

- by M.M.

Hi, I have an Apache/2.2.15 (VC9) and PHP/5.3.2 (VC9 thread safe) running as an apache module on Vista 64bit machine. All running fine. Project that I'm benchmarking (with apache's ab utility) is basically standard Zend Framework project with no db connection involved. Average (median) apache response is about 0.15 seconds. After I've installed APC (3.1.4-dev VC9 thread safe) with standard settings suddenly the request response time raised to 1.3 seconds (!), which is unacceptable... All apc settings looked always good (through the apc.php script: enough shm memory, no cache full, fragmentation 0%). Only difference was to disable the stats lookup (apc.stat = 0). Then the response dropped to 0.09 seconds which was finally better than without the apc. IIRC, it's expected and obvious that the stat lookup creates some overhead, but shouldn't it still be far more performant compared to running wihout the apc extension at all? Or put it differently why is the apc.stat creating so much overhead? Apparently, something is not working as it should, I don't really know where to start looking. Thank you for your time/answers/direction in advance. Cheers, m.

Read the article

Explaining a Ruby code snippet

- by Michael Foukarakis

I'm in that uncomfortable position again, where somebody has left me with a code snippet in a language I don't know and I have to maintain it. While I haven't introduced Ruby to myself some parts of it are quite simple, but I'd like to hear your explanations nonetheless. Here goes: words = File.open("lengths.txt") {|f| f.read }.split # read all lines of a file in 'words'? values = Array.new(0) words.each { |value| values << value.to_i } # looked this one up, it's supposed to convert to an array of integers, right? values.sort! values.uniq! diffs = Array.new(0) # this looks unused, unless I'm missing something obvious sum = 0 s = 0 # another unused variable # this looks like it's computing the sum of differences between successive # elements, but that sum also remains unused, or does it? values.each_index { |index| if index.to_i < values.length-1 then sum += values.at(index.to_i + 1) - values.at(index.to_i) end } # could you also explain the syntax here? puts "delta has the value of\n" # this will eventually print the minimum of the original values divided by 2 puts values.at(0) / 2 The above script was supposed to figure out the average of the differences between every two successive elements (integers, essentially) in a list. Am I right in saying this is nowhere near what it actually does, or am I missing something fundamental, which is likely considering I have no Ruby knowledge?

Read the article

A basic load test question

- by user236131

I have a very basic load test question. I am running a load test using VSTS 2008 and I have test rig with controller + 10 agents. This load test is against a SharePoint farm I have. My goal of the load test is to find out the resource utilization on web+app+db tiers of my farm for any given load scenario. An example of a load scenario is Usage profile: Average collaboration (as defined by SCCP) User Load: 500 (using step load pattern=a step of 50 every 2 mins and a warm up time of 2mins for every step) Think time: 0 Load duration: 8hrs Now, the question is: Is it fair to expect that metrics like Requests/sec, %processor time on web front end / App / DB, Test/sec, and etc become flat or enter a steady state at one point in time during the load test. Like I said, the goal is not to create a bottleneck but to only measure the utilization of resources by the above load profile. I am asking this question because I see something different. At one point in the load test, requests/sec becomes more or less flat. But processor utilization on the web/DB servers keeps increasing. After digging through the data a bit, I see that "tests running" counter also steadily increased over time. So, if I run the load test for more than 8hrs, %processor may go up further. This way, I don't know what to consider as the load excreted by the load profile. What does this "tests running" counter really signify? How is this different from tests/sec? Another question is: how can I find out why "tests running" counter shows an increase overtime? Thanks for your time

Read the article

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

- by EmbeddedProg

I found this article here: Quantifying the Performance of Garbage Collection vs. Explicit Memory Management http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf In the conclusion section, it reads: Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management. So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?) Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved? Thanks.

Read the article

Optimizing near-duplicate value search

- by GApple

I'm trying to find near duplicate values in a set of fields in order to allow an administrator to clean them up. There are two criteria that I am matching on One string is wholly contained within the other, and is at least 1/4 of its length The strings have an edit distance less than 5% of the total length of the two strings The Pseudo-PHP code: foreach($values as $value){ foreach($values as $match){ if( ( $value['length'] < $match['length'] && $value['length'] * 4 > $match['length'] && stripos($match['value'], $value['value']) !== false ) || ( $match['length'] < $value['length'] && $match['length'] * 4 > $value['length'] && stripos($value['value'], $match['value']) !== false ) || ( abs($value['length'] - $match['length']) * 20 < ($value['length'] + $match['length']) && 0 < ($match['changes'] = levenshtein($value['value'], $match['value'])) && $match['changes'] * 20 <= ($value['length'] + $match['length']) ) ){ $matches[] = &$match; } } } I've tried to reduce calls to the comparatively expensive stripos and levenshtein functions where possible, which has reduced the execution time quite a bit. However, as an O(n^2) operation this just doesn't scale to the larger sets of values and it seems that a significant amount of the processing time is spent simply iterating through the arrays. Some properties of a few sets of values being operated on Total | Strings | # of matches per string | | Strings | With Matches | Average | Median | Max | Time (s) | --------+--------------+---------+--------+------+----------+ 844 | 413 | 1.8 | 1 | 58 | 140 | 593 | 156 | 1.2 | 1 | 5 | 62 | 272 | 168 | 3.2 | 2 | 26 | 10 | 157 | 47 | 1.5 | 1 | 4 | 3.2 | 106 | 48 | 1.8 | 1 | 8 | 1.3 | 62 | 47 | 2.9 | 2 | 16 | 0.4 | Are there any other things I can do to reduce the time to check criteria, and more importantly are there any ways for me to reduce the number of criteria checks required (for example, by pre-processing the input values), since there is such low selectivity?

Read the article

How security of the systems might be improved using database procedures?

- by Centurion

The usage of Oracle PL/SQL procedures for controlling access to data often emphasized in PL/SQL books and other sources as being more secure approach. I'v seen several systems where all business logic related with data is performed through packages, procedures and functions, so application code becomes quite "dumb" and is only responsible for visualization part. I even heard some devs call such approaches and driving architects as database nazi :) because all logic code resides in database. I do know about DB procedure performance benefits, but now I'm interested in a "better security" when using thick client model. I assume such design mostly used when Oracle (and maybe MS SQL Server) databases are used. I do agree such approach improves security but only if there are not much users and every system user has a database account, so we might control and monitor data access through standard database user security. However, how such approach could increase the security for an average web system where thick clients are used: for example one database user with DML grants on all tables, and other users are handled using "users" and"user_rights" tables? We could use DB procedures, save usernames into context use that for filtering but vulnerability resides at the root - if the main database account is compromised than nothing will help. Of course in a real system we might consider at least several main users (for example frontend_db_user, backend_db_user).

Read the article

Speed up csv export when using php from mysql database query

- by John

Ok, so i've got a web system (built on codeigniter & running on mysql) that allows people to query a database of postal address data by making selections in a series of forms until they arrive at the selection that want, pretty standard stuff. They can then buy that information and download it via that system. The queries run very fast, but when it comes to applying that query to the database,and exporting it to csv, once the datasets get to around the 30,000 record mark (each row has around 40 columns of which about 20 are all populated with on average 20 chars of data per cell) it can take 5 or so minutes to export to csv. So, my question is, what is the main cause for the slowness? Is it that the resultset of data from the query is so large, that it is running into memory issues? Therefore should i allow much more memory to the process? Or, is there a much more efficient way of exporting to csv from a mysql query that i'm not doing? Should i save the contents of the query to a temp table and simply export the temp table to csv? Or am i going about this all wrong? Also, is the fact that i'm using Codeigniters Active Record for this prohibitive due to the way that it stores the resultset? Any advice is welcome! Thank you for reading!

Read the article

MDX: Filtering a member set by a measure's table values

- by oyvinro

I have some numbers in a fact table, and have generated a measure which use the SUM aggregator to summarize the numbers. But the problem is that I only want to sum the numbers that are higher than, say 10. I tried using a generic expression in the measure definition, and that works of course, but the problem is that I need to be able to dynamically set that value, because it's not always 10, meaning users should be able to select it themselves. More specifically, my current MDX looks like this: WITH SET [Email Measures] AS '{[Measures].[Number Of Answered Cases], [Measures].[Max Expedition Time First In Case], [Measures].[Avg Expedition Times First In Case], [Measures].[Number Of Incoming Email Requests], [Measures].[Avg Number Of Emails In Cases], [Measures].[Avg Expedition Times Total],[Measures].[Number Of Answered Incoming Emails]}' SET [Organizations] AS '{[Organization.Id].[860]}' SET [Operators] AS '{[Operator.Id].[3379],[Operator.Id].[3181]}' SET [Email Accounts] AS '{[Email Account.Id].[6]}' MEMBER [Time.Date].[Date Period] AS Aggregate ({[Time.Date].[2008].[11].[11] :[Time.Date].[2009].[1].[2] }) MEMBER [Email.Type].[Email Types] AS Aggregate ({[Email.Type].[0]}) SELECT {[Email Measures]} ON columns, [Operators] ON rows FROM [Email_Fact] WHERE ( [Time.Date].[Date Period] ) Now, the member in question is the calculated member [Avg Expedition Times Total]. This member takes in two measures; [Sum Expedition Times] and [Nr of Expedition Times] and splits one on the other to get the average, all this presently works. However, I want [Sum Expedition Times] to only summarize values over or under a parameter of my/the user's wish. How do I filter the numbers [Sum Expedition Times] iterates through, rather than filtering on the sum that the measure gives me in the end?

Read the article

How to handle product ratings in a database

- by Mel

Hello, I would like to know what is the best approach to storing product ratings in a database. I have in mind the following two (simplified, and assuming a MySQL db) scenarios: Scenario 1: Create two columns in the product table to store number of votes and the sum of all votes. Use columns to get an average on the product display page: products(productedID, productName, voteCount, voteSum) Pros: I will only need to access one table, and thus execute one query to display product data and ratings. Cons: Write operations will be executed in a table whose original purpose is only to furnish product data. Scenario 2: Create an additional table to store ratings. products(productID, productName) ratings(productID, voteCount, voteSum) Pros: Isolate ratings into a separate table, leaving the products table to furnish data on available products. Cons: I will have to execute two separate queries on product page requests (one for data and another for ratings). In terms of performance, which of the following two approaches is best: Allow users to execute an occasional write query to a table that will handle hundreds of read requests? Execute two queries at every product page, but isolate the write query into a separate table. I'm a novice to database development, and often find myself struggling with simple questions such as these. Many thanks,

Read the article

How important is it that models be consistent across project components?

- by RonLugge

I have a project with two components, a server-side component and a client-side component. For various reasons, the client-side device doesn't carry a fully copy of the database around. How important is it that my models have a 1:1 correlation between the two sides? And, to extend the question to my bigger concern, are there any time-bombs I'm going to run into down the line if they don't? I'm not talking about having different information on each side, but rather the way the information is encapsulated will vary. (Obviously, storage mechanisms will also vary) The server side will store each user, each review, each 'item' with seperate tables, and create links between them to gather data as necessary. The client side shouldn't have a complete user database, however, so rather than link against the user for gathering things like 'name', I'd store that on the review. In other words... --- Server Side --- Item: +id //Store stuff about the item User: +id +Name -Password Review: +id +itemId +rating +text +userId --- Device Side --- Item: +id +AverageRating Review: +id +rating +text +userId +name User: +id +Name //Stuff The basic idea is that certain 'critical' information gets moved one level 'up'. A user gets the list of 'items' relevant to their query, with certain review-orientation moved up (i. e. average rating). If they want more info, they query the detail view for the item, and the actual reviews get queried and added to the dataset (and displayed). If they query the actual review, the review gets queried and they pick up some additional user info along the way (maybe; I'm not sure if the user would have any use for any of the additional user information). My basic concern is that I don't wan't to glut the user's bandwidth or local storage with a huge variety of information that they just don't need, even if proper database normalizations suggests that information REALLY should be stored at a 'lower' level. I've phrased this as a fairly low-level conceptual issue because that's the level I'm trying to think / worry over, but if it matters I'm creating a PHP / MySQL server that provides data for a iOS / CoreData client.

Read the article

Python "callable" attribute (pseudo-property)

- by mgilson

In python, I can alter the state of an instance by directly assigning to attributes, or by making method calls which alter the state of the attributes: foo.thing = 'baz' or: foo.thing('baz') Is there a nice way to create a class which would accept both of the above forms which scales to large numbers of attributes that behave this way? (Shortly, I'll show an example of an implementation that I don't particularly like.) If you're thinking that this is a stupid API, let me know, but perhaps a more concrete example is in order. Say I have a Document class. Document could have an attribute title. However, title may want to have some state as well (font,fontsize,justification,...), but the average user might be happy enough just setting the title to a string and being done with it ... One way to accomplish this would be to: class Title(object): def __init__(self,text,font='times',size=12): self.text = text self.font = font self.size = size def __call__(self,*text,**kwargs): if(text): self.text = text[0] for k,v in kwargs.items(): setattr(self,k,v) def __str__(self): return '<title font={font}, size={size}>{text}</title>'.format(text=self.text,size=self.size,font=self.font) class Document(object): _special_attr = set(['title']) def __setattr__(self,k,v): if k in self._special_attr and hasattr(self,k): getattr(self,k)(v) else: object.__setattr__(self,k,v) def __init__(self,text="",title=""): self.title = Title(title) self.text = text def __str__(self): return str(self.title)+'<body>'+self.text+'</body>' Now I can use this as follows: doc = Document() doc.title = "Hello World" print (str(doc)) doc.title("Goodbye World",font="Helvetica") print (str(doc)) This implementation seems a little messy though (with __special_attr). Maybe that's because this is a messed up API. I'm not sure. Is there a better way to do this? Or did I leave the beaten path a little too far on this one? I realize I could use @property for this as well, but that wouldn't scale well at all if I had more than just one attribute which is to behave this way -- I'd need to write a getter and setter for each, yuck.

Read the article

Declaring arrays in c language without initial size

- by user2534857

this is the question-- Write a program to manipulate the temperature details as given below. - Input the number of days to be calculated. – Main function - Input temperature in Celsius – input function - Convert the temperature from Celsius to Fahrenheit.- Separate function - find the average temperature in Fahrenheit. how can I make this program without initial size of array ?? #include<stdio.h> #include<conio.h> void input(int); int temp[10]; int d; void main() { int x=0; float avg=0,t=0; printf("\nHow many days : "); scanf("%d",&d); input(d); conv(); for(x=0;x<d;x++) { t=t+temp[x]; } avg=t/d; printf("Avarage is %f",avg); getch(); } void input(int d) { int x=0; for(x=0;x<d;x++) { printf("Input temperature in Celsius for #%d day",x+1); scanf("%d",&temp[x]); } } void conv() { int x=0; for(x=0;x<d;x++) { temp[x]=1.8*temp[x]+32; } }

Read the article

Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?

- by user89

I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast. def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects): insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)" analysisNumber = int(outputPath[-3:]) outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb") outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject) numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects) loop1StartTime = time.time() for i in range(numberOfRecordsPerObject ): processedRecords = [] loop2StartTime = time.time() for j in range(numberOfObjects): fout = outputFileObject .read(recordSize) processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))])) loop2EndTime = time.time() print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime) dbInsertStartTime = time.time() connection.executemany(insertString, processedRecords) dbInsertEndTime = time.time() loop1EndTime = time.time() print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime) outputFileObject.close() print "Finished reading output file for analysis {}...".format(analysisNumber) When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM. What could I do to speed up "loop 2"?

Read the article

Using data.table to aggregate

- by dayne

After multiple suggestions from SO users, I am finally trying to convert my code over to using data.tables. library(data.table) DT <- data.table(plate = paste0("plate",rep(1:2,each=5)), id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2), val = 1:10) > DT plate id val 1: plate1 CTRL 1 2: plate1 CTRL 2 3: plate1 ID1 3 4: plate1 ID2 4 5: plate1 ID3 5 6: plate2 CTRL 6 7: plate2 CTRL 7 8: plate2 ID1 8 9: plate2 ID2 9 10: plate2 ID3 10 What I would like to do is take the average of DT[,val] by plate when the id is "CTRL". I would normally aggregate the data frame, then use match to map the values back to a new column, 'ctrl'. Using the data.table package I can get: DT[id=="CTRL",ctrl:=mean(val),by=plate] > DT plate id val ctrl 1: plate1 CTRL 1 1.5 2: plate1 CTRL 2 1.5 3: plate1 ID1 3 NA 4: plate1 ID2 4 NA 5: plate1 ID3 5 NA 6: plate2 CTRL 6 6.5 7: plate2 CTRL 7 6.5 8: plate2 ID1 8 NA 9: plate2 ID2 9 NA 10: plate2 ID3 10 NA What I need is really: DT <- data.table(plate = paste0("plate",rep(1:2,each=5)), id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2), val = 1:10, ctrl = rep(c(1.5,6.5),each=5)) > DT plate id val ctrl 1: plate1 CTRL 1 1.5 2: plate1 CTRL 2 1.5 3: plate1 ID1 3 1.5 4: plate1 ID2 4 1.5 5: plate1 ID3 5 1.5 6: plate2 CTRL 6 6.5 7: plate2 CTRL 7 6.5 8: plate2 ID1 8 6.5 9: plate2 ID2 9 6.5 10: plate2 ID3 10 6.5 Eventually I would like to use much more complicated selections of the values, but I do not know how to select specific values, run some function, then map those values back to the appropriate row using data frames.

Read the article

What are the potential problems with exposing the Facebook API secret?

- by genehack

I'm writing a little web utility that posts status updates to Twitter and/or Facebook. That involved creating 'applications' with both those services in order to get API keys and 'secrets'. My question is how protected I really need to keep those secrets -- in order for this to work at all, you seem to need the secret to interact with the authentication part of the service to grant the app access to your account and/or grant it permission to post updates on your behalf. Facebook's documentation says to protect the secret, but at least one other Facebook utility distributes the API key and secret in the source. It's important to note: this isn't your standard Facebook 'application' that runs within the context of Facebook, nor is it a standard "desktop"-style compiled app -- it's a web-based application intended to be run on your own web server. The audience for this is probably small and somewhat more sophisticated than average -- so, one technical alternative would be to require people to obtain their own API key and secret to use the app. That seems like a lot of work, however, and a fairly large barrier to entry to anybody using this. Anybody know or have any insight on what sort of trouble I'm letting myself in for if I put both the secrets and the API keys in the config for my app and check it into Github for all the world to see?

Read the article

memcached: which is faster, doing an add (and checking result), or doing a get (and set when returni

- by Mike Sherov

The title of this question isn't so clear, but the code and question is straightforward. Let's say I want to show my users an ad once per day. To accomplish this, every time they visit a page on my site, I check to see if a certain memcache key has any data stored on it. If so, don't show an ad. If not, store the value '1' in that key with an expiration of 86400. I can do this 2 ways: //version a $key='OPD_'.date('Ymd').'_'.$type.'_'.$user; if($memcache->get($key)===false){ $memcache->set($key,'1',false,$expire); //show ad } //version b $key='OPD_'.date('Ymd').'_'.$type.'_'.$user; if($memcache->add($key,'1',false,$expire)){ //show ad } Now, it might seem obvious that b is better, it always makes 1 memcache call. However, what is the overhead of "add" vs. "get"? These aren't the real comparisons... and I just made up these numbers, but let's say 1 add ~= 1 set ~= 5 get in terms of effort, and the average user views 5 pages a day: a: (5 get * 1 effort) + (1 set * 5 effort) = 10 units of effort b: (5 add * 5 effort) = 25 units of effort Would it make sense to always do the add call? Is this an unnecessary micro-optimization?

Read the article

How many hours a day (of the standard 8) do you actually work? [closed]

- by someone

Possible Duplicate: How much do you [really] work a day When I started working (not so long ago), I was very conscientious about really working. If I didn't work for 10 minutes at a time, I felt like I was cheating. But as I started to look around me, I realized that I was the only one... and most of my coworkers were spending a big percentage of their time browsing the internet or playing solitaire. I started to slack off a little more than usual... while still basically getting all my work done. But while I do all that's required of me, and usually quickly, I no longer beg for work to fill up my spare time; I'm content to do what I'm told and play around when no one makes sure I'm busy enough. Which means that I'm often bored and underutilized. (Which I was even when I begged for work - people are pretty laid back about the workload and don't seem to realize how much I can get done if pushed to the fullest.) But I was just talking to a friend who graduated with me and also recently started working... and she came to me with the same concerns about slacking. She's working remotely, which means there are often gaps in communication when she can't really get anything done... And she's feeling guilty about it. Which made me rethink the whole thing... So, as workers, how many hours, out of the 8 standard average, are you actually working (honestly)? And, as bosses, how many hours do you expect your workers to work? And from an ethical standpoint, how much free time, or space out time, can workers have during the day without being considered to be "cheating" their office of labor and money?

Read the article

How to learn a new programming language? And how to choose the appropriate language?

- by Sebi

Unfortunately my last question was closed, so I reformulated the question: I know this question was here a lot of times and can't be answered at all, but im not looking for a single name, but rather for an advice in my situation. I learned programming with Java and now I'm developing in Java for more or less 5 years (at the university) and I thinks my programming skills their are really ok/average. I have also small experience in C/C++ and C#. Now I have some spare time and I'd like to learn a new language or deepen the knowledge of Java/C/C++. But how to choose the right language to learn? I'd like to learn a language which will be usefull in the future concerning working in a software development business? I know there is no single answer, but I'm sure you could mention some languages that are more usefull than others. And what's the best way to learn such a language efficiently (bearing in mind that one has already learned some other languages)? Just doing tutorials? Or trying to implement a project? Trying to move a Java project to the new language?

Read the article

C#: Does an IDisposable in a Halted Iterator Dispose?

- by James Michael Hare

If that sounds confusing, let me give you an example. Let's say you expose a method to read a database of products, and instead of returning a List<Product> you return an IEnumerable<Product> in iterator form (yield return). This accomplishes several good things: The IDataReader is not passed out of the Data Access Layer which prevents abstraction leak and resource leak potentials. You don't need to construct a full List<Product> in memory (which could be very big) if you just want to forward iterate once. If you only want to consume up to a certain point in the list, you won't incur the database cost of looking up the other items. This could give us an example like: 1: // a sample data access object class to do standard CRUD operations. 2: public class ProductDao 3: { 4: private DbProviderFactory _factory = SqlClientFactory.Instance 5: 6: // a method that would retrieve all available products 7: public IEnumerable<Product> GetAvailableProducts() 8: { 9: // must create the connection 10: using (var con = _factory.CreateConnection()) 11: { 12: con.ConnectionString = _productsConnectionString; 13: con.Open(); 14: 15: // create the command 16: using (var cmd = _factory.CreateCommand()) 17: { 18: cmd.Connection = con; 19: cmd.CommandText = _getAllProductsStoredProc; 20: cmd.CommandType = CommandType.StoredProcedure; 21: 22: // get a reader and pass back all results 23: using (var reader = cmd.ExecuteReader()) 24: { 25: while(reader.Read()) 26: { 27: yield return new Product 28: { 29: Name = reader["product_name"].ToString(), 30: ... 31: }; 32: } 33: } 34: } 35: } 36: } 37: } The database details themselves are irrelevant. I will say, though, that I'm a big fan of using the System.Data.Common classes instead of your provider specific counterparts directly (SqlCommand, OracleCommand, etc). This lets you mock your data sources easily in unit testing and also allows you to swap out your provider in one line of code. In fact, one of the shared components I'm most proud of implementing was our group's DatabaseUtility library that simplifies all the database access above into one line of code in a thread-safe and provider-neutral way. I went with my own flavor instead of the EL due to the fact I didn't want to force internal company consumers to use the EL if they didn't want to, and it made it easy to allow them to mock their database for unit testing by providing a MockCommand, MockConnection, etc that followed the System.Data.Common model. One of these days I'll blog on that if anyone's interested. Regardless, you often have situations like the above where you are consuming and iterating through a resource that must be closed once you are finished iterating. For the reasons stated above, I didn't want to return IDataReader (that would force them to remember to Dispose it), and I didn't want to return List<Product> (that would force them to hold all products in memory) -- but the first time I wrote this, I was worried. What if you never consume the last item and exit the loop? Are the reader, command, and connection all disposed correctly? Of course, I was 99.999999% sure the creators of C# had already thought of this and taken care of it, but inspection in Reflector was difficult due to the nature of the state machines yield return generates, so I decided to try a quick example program to verify whether or not Dispose() will be called when an iterator is broken from outside the iterator itself -- i.e. before the iterator reports there are no more items. So I wrote a quick Sequencer class with a Dispose() method and an iterator for it. Yes, it is COMPLETELY contrived: 1: // A disposable sequence of int -- yes this is completely contrived... 2: internal class Sequencer : IDisposable 3: { 4: private int _i = 0; 5: private readonly object _mutex = new object(); 6: 7: // Constructs an int sequence. 8: public Sequencer(int start) 9: { 10: _i = start; 11: } 12: 13: // Gets the next integer 14: public int GetNext() 15: { 16: lock (_mutex) 17: { 18: return _i++; 19: } 20: } 21: 22: // Dispose the sequence of integers. 23: public void Dispose() 24: { 25: // force output immediately (flush the buffer) 26: Console.WriteLine("Disposed with last sequence number of {0}!", _i); 27: Console.Out.Flush(); 28: } 29: } And then I created a generator (infinite-loop iterator) that did the using block for auto-Disposal: 1: // simply defines an extension method off of an int to start a sequence 2: public static class SequencerExtensions 3: { 4: // generates an infinite sequence starting at the specified number 5: public static IEnumerable<int> GetSequence(this int starter) 6: { 7: // note the using here, will call Dispose() when block terminated. 8: using (var seq = new Sequencer(starter)) 9: { 10: // infinite loop on this generator, means must be bounded by caller! 11: while(true) 12: { 13: yield return seq.GetNext(); 14: } 15: } 16: } 17: } This is really the same conundrum as the database problem originally posed. Here we are using iteration (yield return) over a large collection (infinite sequence of integers). If we cut the sequence short by breaking iteration, will that using block exit and hence, Dispose be called? Well, let's see: 1: // The test program class 2: public class IteratorTest 3: { 4: // The main test method. 5: public static void Main() 6: { 7: Console.WriteLine("Going to consume 10 of infinite items"); 8: Console.Out.Flush(); 9: 10: foreach(var i in 0.GetSequence()) 11: { 12: // could use TakeWhile, but wanted to output right at break... 13: if(i >= 10) 14: { 15: Console.WriteLine("Breaking now!"); 16: Console.Out.Flush(); 17: break; 18: } 19: 20: Console.WriteLine(i); 21: Console.Out.Flush(); 22: } 23: 24: Console.WriteLine("Done with loop."); 25: Console.Out.Flush(); 26: } 27: } So, what do we see? Do we see the "Disposed" message from our dispose, or did the Dispose get skipped because from an "eyeball" perspective we should be locked in that infinite generator loop? Here's the results: 1: Going to consume 10 of infinite items 2: 0 3: 1 4: 2 5: 3 6: 4 7: 5 8: 6 9: 7 10: 8 11: 9 12: Breaking now! 13: Disposed with last sequence number of 11! 14: Done with loop. Yes indeed, when we break the loop, the state machine that C# generates for yield iterate exits the iteration through the using blocks and auto-disposes the IDisposable correctly. I must admit, though, the first time I wrote one, I began to wonder and that led to this test. If you've never seen iterators before (I wrote a previous entry here) the infinite loop may throw you, but you have to keep in mind it is not a linear piece of code, that every time you hit a "yield return" it cedes control back to the state machine generated for the iterator. And this state machine, I'm happy to say, is smart enough to clean up the using blocks correctly. I suspected those wily guys and gals at Microsoft engineered it well, and I wasn't disappointed. But, I've been bitten by assumptions before, so it's good to test and see. Yes, maybe you knew it would or figured it would, but isn't it nice to know? And as those campy 80s G.I. Joe cartoon public service reminders always taught us, "Knowing is half the battle...". Technorati Tags: C#,.NET

Search Results

Search found 3392 results on 136 pages for 'average joe'.

Page 122/136 | < Previous Page | 118 119 120 121 122 123 124 125 126 127 128 129 | Next Page >

- by John

- by Yugal Jindle

- by l_core

- by Bob Cross

- by Paul

- by Fadrian

- by M.M.

- by Michael Foukarakis

- by user236131

- by EmbeddedProg

- by GApple

- by Centurion

- by John

- by oyvinro

- by Mel

- by RonLugge

- by mgilson

- by user2534857

- by user89

- by dayne

- by genehack

- by Mike Sherov

- by someone

- by Sebi

- by James Michael Hare

< Previous Page | 118 119 120 121 122 123 124 125 126 127 128 129 | Next Page >