Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 149/172 | < Previous Page | 145 146 147 148 149 150 151 152 153 154 155 156 | Next Page >

Encryption puzzle / How to create a ProxyStub for a Remote Assistance ticket

- by Jon Clegg

I am trying to create a ticket for Remote Assistance. Part of that requires creating a PassStub parameter. As of the documenation: http://msdn.microsoft.com/en-us/library/cc240115(PROT.10).aspx PassStub: The encrypted novice computer's password string. When the Remote Assistance Connection String is sent as a file over e-mail, to provide additional security, a password is used.<16 In part 16 they detail how to create as PassStub. In Windows XP and Windows Server 2003, when a password is used, it is encrypted using PROV_RSA_FULL predefined Cryptographic provider with MD5 hashing and CALG_RC4, the RC4 stream encryption algorithm. As PassStub looks like this in the file: PassStub="LK#6Lh*gCmNDpj" If you want to generate one yourself run msra.exe in Vista or run the Remote Assistance tool in WinXP. The documentation says this stub is the result of the function CryptEncrypt with the key derived from the password and encrypted with the session id (Those are also in the ticket file). The problem is that CryptEncrypt produces a binary output way larger then the 15 byte PassStub. Also the PassStub isn't encoding in any way I've seen before. Some interesting things about the PassStub encoding. After doing statistical analysis the 3rd char is always a one of: !#$&()+-=@^. Only symbols seen everywhere are: *_ . Otherwise the valid characters are 0-9 a-z A-Z. There are a total of 75 valid characters and they are always 15 bytes. Running msra.exe with the same password always generates a different PassStub, indicating that it is not a direct hash but includes the rasessionid as they say. Some other ideas I've had is that it is not the direct result of CryptEncrypt, but a result of the rasessionid in the MD5 hash. In MS-RA (http://msdn.microsoft.com/en-us/library/cc240013(PROT.10).aspx). The "PassStub Novice" is simply hex encoded, and looks to be the right length. The problem is I have no idea how to go from any hash to way the ProxyStub looks like.

Read the article
Encryption puzzle / How to create a PassStub for a Remote Assistance ticket

- by Jon Clegg

I am trying to create a ticket for Remote Assistance. Part of that requires creating a PassStub parameter. As of the documentation: http://msdn.microsoft.com/en-us/library/cc240115(PROT.10).aspx PassStub: The encrypted novice computer's password string. When the Remote Assistance Connection String is sent as a file over e-mail, to provide additional security, a password is used.<16 In part 16 they detail how to create as PassStub. In Windows XP and Windows Server 2003, when a password is used, it is encrypted using PROV_RSA_FULL predefined Cryptographic provider with MD5 hashing and CALG_RC4, the RC4 stream encryption algorithm. As PassStub looks like this in the file: PassStub="LK#6Lh*gCmNDpj" If you want to generate one yourself run msra.exe in Vista or run the Remote Assistance tool in WinXP. The documentation says this stub is the result of the function CryptEncrypt with the key derived from the password and encrypted with the session id (Those are also in the ticket file). The problem is that CryptEncrypt produces a binary output way larger then the 15 byte PassStub. Also the PassStub isn't encoding in any way I've seen before. Some interesting things about the PassStub encoding. After doing statistical analysis the 3rd char is always a one of: !#$&()+-=@^. Only symbols seen everywhere are: *_ . Otherwise the valid characters are 0-9 a-z A-Z. There are a total of 75 valid characters and they are always 15 bytes. Running msra.exe with the same password always generates a different PassStub, indicating that it is not a direct hash but includes the rasessionid as they say. Some other ideas I've had is that it is not the direct result of CryptEncrypt, but a result of the rasessionid in the MD5 hash. In MS-RA (http://msdn.microsoft.com/en-us/library/cc240013(PROT.10).aspx). The "PassStub Novice" is simply hex encoded, and looks to be the right length. The problem is I have no idea how to go from any hash to way the PassStub looks like.

Read the article
Which key value store is the most promising/stable?

- by Mike Trpcic

I'm looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I've got no idea where to begin. Just listing from memory, I can think of: CouchDB MongoDB Riak Redis Tokyo Cabinet Berkeley DB Cassandra MemcacheDB And I'm sure that there are more out there that have slipped through my search efforts. With all the information out there, it's hard to find solid comparisons between all of the competitors. My criteria and questions are: (Most Important) Which do you recommend, and why? Which one is the fastest? Which one is the most stable? Which one is the easiest to set up and install? Which ones have bindings for Python and/or Ruby? Edit: So far it looks like Redis is the best solution, but that's only because I've gotten one solid response (from ardsrk). I'm looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why? Edit 2: If anyone has experience with CouchDB, Riak, or MongoDB, I'd love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

Read the article
Where is my python script spending time? Is there "missing time" in my cprofile / pstats trace?

- by fmark

I am attempting to profile a long running python script. The script does some spatial analysis on raster GIS data set using the gdal module. The script currently uses three files, the main script which loops over the raster pixels called find_pixel_pairs.py, a simple cache in lrucache.py and some misc classes in utils.py. I have profiled the code on a moderate sized dataset. pstats returns: p.sort_stats('cumulative').print_stats(20) Thu May 6 19:16:50 2010 phes.profile 355483738 function calls in 11644.421 CPU seconds Ordered by: cumulative time List reduced from 86 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 11644.421 11644.421 <string>:1(<module>) 1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes) 340135349 544.143 0.000 572.481 0.000 utils.py:173(extent_iterator) 8831020 18.492 0.000 18.492 0.000 {range} 231922 3.414 0.000 8.128 0.000 utils.py:152(get_block_in_bands) 142739 1.303 0.000 4.173 0.000 utils.py:97(search_extent_rect) 745181 1.936 0.000 2.500 0.000 find_pixel_pairs.py:40(is_no_data) 285478 1.801 0.000 2.271 0.000 utils.py:98(intify) 231922 1.198 0.000 2.013 0.000 utils.py:116(block_to_pixel_extent) 695766 1.990 0.000 1.990 0.000 lrucache.py:42(get) 1213166 1.265 0.000 1.265 0.000 {min} 1031737 1.034 0.000 1.034 0.000 {isinstance} 142740 0.563 0.000 0.909 0.000 utils.py:122(find_block_extent) 463844 0.611 0.000 0.611 0.000 utils.py:112(block_to_pixel_coord) 745274 0.565 0.000 0.565 0.000 {method 'append' of 'list' objects} 285478 0.346 0.000 0.346 0.000 {max} 285480 0.346 0.000 0.346 0.000 utils.py:109(pixel_coord_to_block_coord) 324 0.002 0.000 0.188 0.001 utils.py:27(__init__) 324 0.016 0.000 0.186 0.001 gdal.py:848(ReadAsArray) 1 0.000 0.000 0.160 0.160 utils.py:50(__init__) The top two calls contain the main loop - the entire analyis. The remaining calls sum to less than 625 of the 11644 seconds. Where are the remaining 11,000 seconds spent? Is it all within the main loop of find_pixel_pairs.py? If so, can I find out which lines of code are taking most of the time?

Read the article
Classification: Dealing with Abstain/Rejected Class

- by abner.ayala

I am asking for your input and/help on a classification problem. If anyone have any references that I can read to help me solve my problem even better. I have a classification problem of four discrete and very well separated classes. However my input is continuous and has a high frequency (50Hz), since its a real-time problem. The circles represent the clusters of the classes, the blue line the decision boundary and Class 5 equals the (neutral/resting do nothing class). This class is the rejected class. However the problem is that when I move from one class to the other I activate a lot of false positives in the transition movements, since the movement is clearly non-linear. For example, every time I move from class 5 (neutral class) to 1 I first see a lot of 3's before getting to the 1 class. Ideally, I will want my decision boundary to look like the one in the picture below where the rejected class is Class =5. Has a higher decision boundary than the others classes to avoid misclassification during transition. I am currently implementing my algorithm in Matlab using naive bayes, kNN, and SVMs optimized algorithms using Matlab. Question: What is the best/common way to handle abstain/rejected classes classes? Should I use (fuzzy logic, loss function, should I include resting cluster in the training)?

Read the article
An unusual type signature

- by Travis Brown

In Monads for natural language semantics, Chung-Chieh Shan shows how monads can be used to give a nicely uniform restatement of the standard accounts of some different kinds of natural language phenomena (interrogatives, focus, intensionality, and quantification). He defines two composition operations, A_M and A'_M, that are useful for this purpose. The first is simply ap. In the powerset monad ap is non-deterministic function application, which is useful for handling the semantics of interrogatives; in the reader monad it corresponds to the usual analysis of extensional composition; etc. This makes sense. The secondary composition operation, however, has a type signature that just looks bizarre to me: (<?>) :: (Monad m) => m (m a -> b) -> m a -> m b (Shan calls it A'_M, but I'll call it <?> here.) The definition is what you'd expect from the types; it corresponds pretty closely to ap: g <?> x = g >>= \h -> return $ h x I think I can understand how this does what it's supposed to in the context of the paper (handle question-taking verbs for interrogatives, serve as intensional composition, etc.). What it does isn't terribly complicated, but it's a bit odd to see it play such a central role here, since it's not an idiom I've seen in Haskell before. Nothing useful comes up on Hoogle for either m (m a -> b) -> m a -> m b or m (a -> b) -> a -> m b. Does this look familiar to anyone from other contexts? Have you ever written this function?

Read the article
Which book should I choose?

- by sebastianlarsson

Hi guys, I'm looking for a good read on object oriented design. The two books I'm currently looking Head First Design Patterns and Head First Object object-oriented analysis & design. They seem very similar when looking at the contents and browsing through available sample text. Which one would be the best choice? About myself: I have a bachelor in computer science and I am currently studying Msc. Software Quality Engineering (read Software Engineering with focus on Quality). I am already confident in object-oriented design and have a lot of programming courses in my backpack. I have done games in c++, courses in advanced java programming (I am SCJP certified), but my preferred language is C#. I have also worked with Java for the last 7 months while studying. I am currently also studying for certificates in C# (apart from my usual studies). So I believe I have the prerequisites of actually understanding the contents of both books. Reason: I just want to be better and keep evolving as a programmer. I think it is fun. I believe Bert Bates and Kathy Sierra are involved in both these books and I have previously read their SCJP preparation book in java. I really do enjoy their style of writing. Other books which I am considering are: Clean Code: A Handbook Of Agile Software Craftsmanship Thx in advance Sebastian

Read the article
Writing catch block with cleanup operations in Java ...

- by kedarmhaswade

I was not able to find any advise on catch blocks in Java that involve some cleanup operations which themselves could throw exceptions. The classic example is that of stream.close() which we usually call in the finally clause and if that throws an exception, we either ignore it by calling it in a try-catch block or declare it to be rethrown. But in general, how do I handle cases like: public void doIt() throws ApiException { //ApiException is my "higher level" exception try { doLower(); } catch(Exception le) { doCleanup(); //this throws exception too which I can't communicate to caller throw new ApiException(le); } } I could do: catch(Exception le) { try { doCleanup(); } catch(Exception e) { //ignore? //log? } throw new ApiException(le); //I must throw le } But that means I will have to do some log analysis to understand why cleanup failed. If I did: catch(Exception le) { try { doCleanup(); } catch(Exception e) { throw new ApiException(e); } It results in losing the le that got me here in the catch block in the fist place. What are some of the idioms people use here? Declare the lower level exceptions in throws clause? Ignore the exceptions during cleanup operation?

Read the article
ECM (Niche Vs Mass Market)

- by Luj Reyes

Hi Everyone, I recently started a little company with a couple of guys. Ours is the typical startup, a lot of ideas, dreams, talent and work hours :P. Our initial business plan was to develop a DM (Document Manager) with several features found on DropBox and other tools but with a big differentiator. Then we got in the team this Business Guy (I must say that several of us could be called 'Business Guys' but we are mainly hackers, he is just Another 'Networking Guy'), and along with him came this market analysis for a DM aimed at a very specific and narrow niche. We have many elements to believe in his market study and the idea is the classic "The market is X million, so if we grab a 10%...", and the market is really there to grab because all big providers deemed it too little and fled, let's say that the market is 5 million USD and demand very specific features. If we decide to go for this niche product we face a sales cycle of about 7 months, and the main goal of these revenue is to develop more ambitious projects. (Institutional VC is out of the question if you want to keep a marginal ownership of your company in my country). The only overlap between the niche and the mass market product features is the ability to store documents; everything else requires that we focus all of our efforts towards one or the other. I've studied a lot about the differences between Mass and Niche Markets, but I want to hear from people with actual experience. So everything comes down to this: If you have a really “saleable” idea what is the right thing to do: to go for the niche or go for the big prize and target primarily the mass market? Thanks for your input

Read the article
Hazelcast Distributed Executor Service KeyOwner

- by János Veres

I have problem understanding the concept of Hazelcast Distributed Execution. It is said to be able to perform the execution on the owner instance of a specific key. From Documentation: <T> Future<T> submitToKeyOwner(Callable<T> task, Object key) Submits task to owner of the specified key and returns a Future representing that task. Parameters: task - task key - key Returns: a Future representing pending completion of the task I believe that I'm not alone to have a cluster built with multiple maps which might actually use the same key for different purposes, holding different objects (e.g. something along the following setup): IMap<String, ObjectTypeA> firstMap = HazelcastInstance.getMap("firstMap"); IMap<String, ObjectTypeA_AppendixClass> secondMap = HazelcastInstance.getMap("secondMap"); To me it seems quite confusing what documentation says about the owner of a key. My real frustration is that I don't know WHICH - in which map - key does it refer to? The documentation also gives a "demo" of this approach: import com.hazelcast.core.Member; import com.hazelcast.core.Hazelcast; import com.hazelcast.core.IExecutorService; import java.util.concurrent.Callable; import java.util.concurrent.Future; import java.util.Set; import com.hazelcast.config.Config; public void echoOnTheMemberOwningTheKey(String input, Object key) throws Exception { Callable<String> task = new Echo(input); HazelcastInstance hz = Hazelcast.newHazelcastInstance(); IExecutorService executorService = hz.getExecutorService("default"); Future<String> future = executorService.submitToKeyOwner(task, key); String echoResult = future.get(); } Here's a link to the documentation site: Hazelcast MultiHTML Documentation 3.0 - Distributed Execution Did any of you guys figure out in the past what key does it want?

Read the article
I have two choices of Master's classes this fall. Which is the most useful?

- by ahplummer

(For background purposes and context): I am a Software Engineer, and manage other Software Engineers currently. I kind of wear two hats right now: one of a programmer, and one as a 'team lead'. In this regard, I've started going back to school to get my Master's degree with an emphasis in Computer Science. I already have a Bachelor's in Computer Science, and have been working in the field for about 13 years. Our primary development environment is a Windows environment, writing in .NET, Delphi, and SQL Server. Choice #1: CST 798 DATA VISUALIZATION Course Description: Basically, this is a course on the "Processing" language: http://processing.org/ Choice #2: CST 711 INFORMATICS Course Description: (From catalog): Informatics is the science of the use and processing of data, information, and knowledge. This course covers a variety of applied issues from information technology, information management at a variety of levels, ranging from simple data entry, to the creation, design and implementation of new information systems, to the development of models. Topics include basic information representation, processing, searching, and organization, evaluation and analysis of information, Internet-based information access tools, ethics and economics of information sharing.

Read the article
How to perform Linq select new with datetime in SQL 2008

- by kd7iwp

In our C# code I recently changed a line from inside a linq-to-sql select new query as follows: OrderDate = (p.OrderDate.HasValue ? p.OrderDate.Value.Year.ToString() + "-" + p.OrderDate.Value.Month.ToString() + "-" + p.OrderDate.Value.Day.ToString() : "") To: OrderDate = (p.OrderDate.HasValue ? p.OrderDate.Value.ToString("yyyy-mm-dd") : "") The change makes the line smaller and cleaner. It also works fine with our SQL 2008 database in our development environment. However, when the code deployed to our production environment which uses SQL 2005 I received an exception stating: Nullable Type must have a value. For further analysis I copied (p.OrderDate.HasValue ? p.OrderDate.Value.ToString("yyyy-mm-dd") : "") into a string (outside of a Linq statement) and had no problems at all, so it only causes an in issue inside my Linq. Is this problem just something to do with SQL 2005 using different date formats than from SQL 2008? Here's more of the Linq: dt = FilteredOrders.Where(x => x != null).Select(p => new { Order = p.OrderId, link = "/order/" + p.OrderId.ToString(), StudentId = (p.PersonId.HasValue ? p.PersonId.Value : 0), FirstName = p.IdentifierAccount.Person.FirstName, LastName = p.IdentifierAccount.Person.LastName, DeliverBy = p.DeliverBy, OrderDate = p.OrderDate.HasValue ? p.OrderDate.Value.Date.ToString("yyyy-mm-dd") : ""}).ToDataTable(); This is selecting from a List of Order objects. The FilteredOrders list is from another linq-to-sql query and I call .AsEnumerable on it before giving it to this particular select new query. Doing this in regular code works fine: if (o.OrderDate.HasValue) tempString += " " + o.OrderDate.Value.Date.ToString("yyyy-mm-dd");

Read the article
Memory usage in Flash / Flex / AS3

- by ggambett

I'm having some trouble with memory management in a flash app. Memory usage grows quite a bit, and I've tracked it down to the way I load assets. I embed several raster images in a class Embedded, like this [Embed(source="/home/gabriel/text_hard.jpg")] public static var ASSET_text_hard_DOT_jpg : Class; I then instance the assets this way var pClass : Class = Embedded[sResource] as Class; return new pClass() as Bitmap; At this point, memory usage goes up, which is perfectly normal. However, nulling all the references to the object doesn't free the memory. Based on this behavior, looks like the flash player is creating an instance of the class the first time I request it, but never ever releases it - not without references, calling System.gc(), doing the double LocalConnection trick, or calling dispose() on the BitmapData objects. Of course, this is very undesirable - memory usage would grow until everything in the SWFs is instanced, regardless of whether I stopped using some asset long ago. Is my analysis correct? Can anything be done to fix this?

Read the article
Oracle why does creating trigger fail when there is a field called timestamp?

- by Omar Kooheji

I've just wasted the past two hours of my life trying to create a table with an auto incrementing primary key bases on this tutorial, The tutorial is great the issue I've been encountering is that the Create Target fails if I have a column which is a timestamp and a table that is called timestamp in the same table... Why doesn't oracle flag this as being an issue when I create the table? Here is the Sequence of commands I enter: Creating the Table: CREATE TABLE myTable (id NUMBER PRIMARY KEY, field1 TIMESTAMP(6), timeStamp NUMBER, ); Creating the Sequence: CREATE SEQUENCE test_sequence START WITH 1 INCREMENT BY 1; Creating the trigger: CREATE OR REPLACE TRIGGER test_trigger BEFORE INSERT ON myTable REFERENCING NEW AS NEW FOR EACH ROW BEGIN SELECT test_sequence.nextval INTO :NEW.ID FROM dual; END; / Here is the error message I get: ORA-06552: PL/SQL: Compilation unit analysis terminated ORA-06553: PLS-320: the declaration of the type of this expression is incomplete or malformed Any combination that does not have the two lines with a the word "timestamp" in them works fine. I would have thought the syntax would be enough to differentiate between the keyword and a column name. As I've said I don't understand why the table is created fine but oracle falls over when I try to create the trigger... CLARIFICATION I know that the issue is that there is a column called timestamp which may or may not be a keyword. MY issue is why it barfed when I tried to create a trigger and not when I created the table, I would have at least expected a warning. That said having used Oracle for a few hours, it seems a lot less verbose in it's error reporting, Maybe just because I'm using the express version though. If this is a bug in Oracle how would one who doesn't have a support contract go about reporting it? I'm just playing around with the express version because I have to migrate some code from MySQL to Oracle.

Read the article
SSRS2005 timeout error

- by jaspernygaard

Hi I've been running around circles the last 2 days, trying to figure a problem in our customers live environment. I figured I might as well post it here, since google gave me very limited information on the error message (5 results to be exact). The error boils down to a timeout when requesting a certain report in SSRS2005, when a certain parameter is used. The deployment scenario is: Machine #1 Running reporting services (SQL2005, W2K3, IIS6) Machine #2 Running datawarehouse database (SQL2005, W2K3) which is the data source for #1 Both machines are running on the same vm cluster and LAN. The report requests a fairly simple SP - lets called it sp(param $a, param $b). When requested with param $a filled, it executes correctly. When using param $b, it times out after the global timeout periode has passed. If I run the stored procedure with param $b directly from sql management studio on #2, it returns the results perfectly fine (within 3-4s). I've profiled the datawarehouse database on #2 and when param $b is used, the query from the reporting service to the database, never reaches #2. The error message that I get upon timeout, when using param $b, when invoking the report directly from SSRS web interface is: "An error has occurred during report processing. Cannot read the next data row for the data set DataSet. A severe error occurred on the current command. The results, if any, should be discarded. Operation cancelled by user." The ExecutionLog for the SSRS does give me much information besides the error message rsProcessingAborted I'm running out of ideas of how to nail this problem. So I would greatly appreciate any comments, suggestions or ideas. Thanks in advance!

Read the article
mysql subselect alternative

- by Arnold

Hi, Lets say I am analyzing how high school sports records affect school attendance. So I have a table in which each row corresponds to a high school basketball game. Each game has an away team id and a home team id (FK to another "team table") and a home score and an away score and a date. I am writing a query that matches attendance with this seasons basketball games. My sample output will be (#_students_missed_class, day_of_game, home_team, away_team, home_team_wins_this_season, away_team_wins_this_season) I now want to add how each team did the previous season to my analysis. Well, I have their previous season stored in the game table but i should be able to accomplish that with a subselect. So in my main select statement I add the subselect: SELECT COUNT(*) FROM game_table WHERE game_table.date BETWEEN 'start of previous season' AND 'end of previous season' AND ( (game_table.home_team = team_table.id AND game_table.home_score > game_table.away_score) OR (game_table.away_team = team_table.id AND game_table.away_score > game_table.home_score)) In this case team-table.id refers to the id of the home_team so I now have all their wins calculated from the previous year. This method of calculation is neither time nor resource intensive. The Explain SQL shows that I have ALL in the Type field and I am not using a Key and the query times out. I'm not sure how I can accomplish a more efficient query with a subselect. It seems proposterously inefficient to have to write 4 of these queries (for home wins, home losses, away wins, away losses). I am sure this could be more lucid. I'll absolutely add color tomorrow if anyone has questions

Read the article
ploting 3d graph in matlab?

- by lytheone

Hello, I am currently a begineer, and i am using matlab to do a data analysis. I have a a text file with data at the first row is formatted as follow: time;wave height 1;wave height 2;....... I have column until wave height 19 and rows total 4000 rows. Data in the first column is time in second. From 2nd column onwards, it is wave height elevation which is in meter. At the moment I like to ask matlab to plot a 3d graph with time on the x axis, wave elevation on the y axis, and wave elevation that correspond to wave height number from 1 to 19, i.e. data in column 2 row 10 has a let say 8m which is correspond to wave height 1 and time at the column 1 row 10. I have try the following: clear; filename='abc.daf'; path='C:\D'; a=dlmread([path '\' filename],' ', 2, 1); [nrows,ncols]=size(a); t=a(1:nrows,1);%define t from text file for i=(1:20), j=(2:21); end wi=a(:,j); for k=(2:4000), l=k; end r=a(l,:); But everytime i use try to plot them, the for loop wi works fine, but for r=a(l,:);, the plot only either give me the last time data only but i want all data in the file to be plot. Is there a way i can do that. I am sorry as it is a bit confusing but i will be very thankful if anyone can help me out. Thank you!!!!!!!!!!

Read the article
How to identify ideas and concepts in a given text

- by Nick

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

Read the article
What is a good programming language/environment for Linux database applications?

- by Dkellygb

I could use some advice on my move from the Windows world to Linux. For my business, I have used VB6 and Microsoft Access with both Access databases and SQL server in the past. The easy to use forms, report writers and programming language were perfect for CRUD apps and analysis for our small hotel/restaurant business. After using Linux at home for some time I would like to convert our small business. Our server is already a Linux box using Samba. I am happy with the OpenOffice.org applications instead of Microsoft Office. The only thing which is holding me back is a desktop database application where I can develop the forms and reports we require. Base does not seem to be up to the job yet from my experience. I would like something like VB.Net with visual studio (express) but I would like to avoid Mono – I just don’t see the point of it. (You can correct me if I’m wrong.) But a good collection of forms, controls and a good report writer would be ideal. I have looked at web based stuff like Ruby on Rails, but I think a webserver for our 5 pc network is overkill. I don’t mind running a proper database on our Ubuntu 9.10 server. I may have exposed a few prejudices above but my mind is open. Any thoughts?

Read the article
Is it possible to write map/reduce jobs for Amazon Elastic MapReduce using .NET?

- by Chris

Is it possible to write map/reduce jobs for Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/) using .NET languages? In particular I would like to use C#. Preliminary research suggests not. The above URL's marketing text suggests you have a "choice of Java, Ruby, Perl, Python, PHP, R, or C++", without mentioning .NET languages. This Amazon thread (http://developer.amazonwebservices.com/connect/thread.jspa?messageID=136051 -- "Support for C# / F# map/reducers") explicitly says that "currently Amazon Elastic MapReduce does not support Mono platform or languages such as C# or F#." The above suggests that it can't be done. I'm wondering if there are any workarounds, though. For example, can I modify the Elastic MapReduce machine image for my account, and install Mono on there? An alternative, suggested by Amazon FAQs "Using Other Software Required by Your Jar" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?CHAP_AdvancedTopics.html) and "How to Use Additional Files and Libraries With the Mapper or Reducer" (http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?addl_files.html), is to make the first step of the Map/Reduce job be to install Mono on the local instance. That sounds kind of inefficient, but maybe it could work? Maybe a saner alternative would be to try to forgo the convenience of Elastic MapReduce, and manually set up my own Hadoop cluster on EC2. Then I assume I could install Mono without difficulty.

Read the article
Handling missing/incomplete data in R--is there function to mask but not remove NAs?

- by doug

As you would expect from a DSL aimed at data analysis, R handles missing/incomplete data very well, for instance: Many R functions have an 'na.rm' flag that you can set to 'T' to remove the NAs: mean( c(5,6,12,87,9,NA,43,67), na.rm=T) But if you want to deal with NAs before the function call, you need to do something like this: to remove each 'NA' from a vector: vx = vx[!is.na(a)] to remove each 'NA' from a vector and replace it w/ a '0': ifelse(is.na(vx), 0, vx) to remove entire each row that contains 'NA' from a data frame: dfx = dfx[complete.cases(dfx),] All of these functions permanently remove 'NA' or rows with an 'NA' in them. Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it). to be as clear as possible about what i'm looking for: python/numpy has a class, 'masked array', with a 'mask' method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

Read the article
Reconstructing trees from a "fingerprint"

- by awshepard

I've done my SO and Google research, and haven't found anyone who has tackled this before, or at least, anyone who has written about it. My question is, given a "universal" tree of arbitrary height, with each node able to have an arbitrary number of branches, is there a way to uniquely (and efficiently) "fingerprint" arbitrary sub-trees starting from the "universal" tree's root, such that given the universal tree and a tree's fingerprint, I can reconstruct the original tree? For instance, I have a "universal" tree (forgive my poor illustrations), representing my universe of possibilities: Root / / / | \ \ ... \ O O O O O O O (Level 1) /|\/|\...................\ (Level 2) etc. I also have tree A, a rooted subtree of my universe Root / /|\ \ O O O O O / Etc. Is there a way to "fingerprint" the tree, so that given that fingerprint, and the universal tree, I could reconstruct A? I'm thinking something along the lines of a hash, a compression, or perhaps a functional/declarative construction? Big-O analysis (in time or space) is a plus. As a for-instance, a nested expression like: {{(Root)},{(1),(2),(3)},{(2,3),(1),(4,5)}...} representing the actual nodes present at each level in the tree is probably valid, but can it be done more efficiently?

Read the article
MySQL table data transformation -- how can I dis-aggreate MySQL time data?

- by lighthouse65

We are coding for a MySQL data warehousing application that stores descriptive data (User ID, Work ID, Machine ID, Start and End Time columns in the first table below) associated with time and production quantity data (Output and Time columns in the first table below) upon which aggregate (SUM, COUNT, AVG) functions are applied. We now wish to dis-aggregate time data for another type of analysis. Our current data table design: +---------+---------+------------+---------------------+---------------------+--------+------+ | User ID | Work ID | Machine ID | Event Start Time | Event End Time | Output | Time | +---------+---------+------------+---------------------+---------------------+--------+------+ | 080025 | ABC123 | M01 | 2008-01-24 16:19:15 | 2008-01-24 16:34:45 | 2120 | 930 | +---------+---------+------------+---------------------+---------------------+--------+------+ Reprocessing dis-aggregation that we would like to do would be to transform table content based on a granularity of minutes, rather than the current production event ("Event Start Time" and "Event End Time") granularity. The resulting reprocessing of existing table rows would look like: +---------+---------+------------+---------------------+--------+ | User ID | Work ID | Machine ID | Production Minute | Output | +---------+---------+------------+---------------------+--------+ | 080025 | ABC123 | M01 | 2010-01-24 16:19 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:20 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:21 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:23 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:24 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:25 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:26 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:27 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:28 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:29 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:30 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:31 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:33 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:34 | 133 | +---------+---------+------------+---------------------+--------+ So the reprocessing would take an existing row of data created at the granularity of production event and modify the granularity to minutes, eliminating redundant (Event End Time, Time) columns while doing so. It assumes a constant rate of production and divides output by the difference in minutes plus one to populate the new table's Output column. I know this can be done in code...but can it be done entirely in a MySQL insert statement (or otherwise entirely in MySQL)? I am thinking of a INSERT ... INTO construction but keep getting stuck. An additional complexity is that there are hundreds of machines to include in the operation so there will be multiple rows (one for each machine) for each minute of the day. Any ideas would be much appreciated. Thanks.

Read the article
Asymptotic runtime of list-to-tree function

- by Deestan

I have a merge function which takes time O(log n) to combine two trees into one, and a listToTree function which converts an initial list of elements to singleton trees and repeatedly calls merge on each successive pair of trees until only one tree remains. Function signatures and relevant implementations are as follows: merge :: Tree a -> Tree a -> Tree a --// O(log n) where n is size of input trees singleton :: a -> Tree a --// O(1) empty :: Tree a --// O(1) listToTree :: [a] -> Tree a --// Supposedly O(n) listToTree = listToTreeR . (map singleton) listToTreeR :: [Tree a] -> Tree a listToTreeR [] = empty listToTreeR (x:[]) = x listToTreeR xs = listToTreeR (mergePairs xs) mergePairs :: [Tree a] -> [Tree a] mergePairs [] = [] mergePairs (x:[]) = [x] mergePairs (x:y:xs) = merge x y : mergePairs xs This is a slightly simplified version of exercise 3.3 in Purely Functional Data Structures by Chris Okasaki. According to the exercise, I shall now show that listToTree takes O(n) time. Which I can't. :-( There are trivially ceil(log n) recursive calls to listToTreeR, meaning ceil(log n) calls to mergePairs. The running time of mergePairs is dependent on the length of the list, and the sizes of the trees. The length of the list is 2^h-1, and the sizes of the trees are log(n/(2^h)), where h=log n is the first recursive step, and h=1 is the last recursive step. Each call to mergePairs thus takes time (2^h-1) * log(n/(2^h)) I'm having trouble taking this analysis any further. Can anyone give me a hint in the right direction?

Read the article
Hadoop reduce task gets hung

- by user806098

I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, it's that the reduce task fails to fetch map output from map nodes. The job tracker log of master shows messages like this: 2011-06-27 19:55:14,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201106271953_0001_r_000000_0' to tip task_201106271953_0001_r_000000, for tracker 'tracker_web30.bbn.com.cn:localhost/127.0.0.1:56476' And the name node log of master shows messages like this: 2011-06-27 14:00:52,898 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call register(DatanodeRegistration(202.106.199.39:50010, storageID=DS-1989397900-202.106.199.39-50010-1308723051262, infoPort=50075, ipcPort=50020)) from 192.168.225.19:16129: error: java.io.IOException: verifyNodeRegistration: unknown datanode 202.106.199.3 9:50010 However, neither the "web30.bbn.com.cn" or 202.106.199.39, 202.106.199.3 is the slave node. I think such ip/domains appear because hadoop fails to resolve a node(first in the Intranet DNS server), then it goes to a higher-level DNS server, later to the top, still fails, then the "junk" ip/domains are returned. But I checked my config, it goes like this: /etc/hosts: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.225.16 master 192.168.225.66 slave1 192.168.225.20 slave5 192.168.225.17 slave17 conf/core-site.xml: hadoop.tmp.dir /root/hadoop_tmp/hadoop_${user.name} fs.default.name hdfs://master:54310 io.sort.mb 1024 hdfs-site.xml: dfs.replication 3 masters: master slaves: master slave1 slave5 slave17 Also, all firewalls(iptables) are turned off, and ssh between each 2 nodes is ok. so I don't know where exact the error comes from. Please help. Thanks a lot.

Read the article

< Previous Page | 145 146 147 148 149 150 151 152 153 154 155 156 | Next Page >