Search Results

Search found 1753 results on 71 pages for 'consistent hashing'.

Page 7/71 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

Combining Java hashcodes into a "master" hashcode

- by Nick Wiggill

I have a vector class with hashCode() implemented. It wasn't written by me, but uses 2 prime numbers by which to multiply the 2 vector components before XORing them. Here it is: /*class Vector2f*/ ... public int hashCode() { return 997 * ((int)x) ^ 991 * ((int)y); //large primes! } ...As this is from an established Java library, I know that it works just fine. Then I have a Boundary class, which holds 2 vectors, "start" and "end" (representing the endpoints of a line). The values of these 2 vectors are what characterize the boundary. /*class Boundary*/ ... public int hashCode() { return 1013 * (start.hashCode()) ^ 1009 * (end.hashCode()); } Here I have attempted to create a good hashCode() for the unique 2-tuple of vectors (start & end) constituting this boundary. My question: Is this hashCode() implementation going to work? (Note that I have used 2 different prime numbers in the latter hashCode() implementation; I don't know if this is necessary but better to be safe than sorry when trying to avoid common factors, I guess -- since I presume this is why primes are popular for hashing functions.)

Read the article
Efficient and accurate way to compact and compare Python lists?

- by daveslab

Hi folks, I'm trying to a somewhat sophisticated diff between individual rows in two CSV files. I need to ensure that a row from one file does not appear in the other file, but I am given no guarantee of the order of the rows in either file. As a starting point, I've been trying to compare the hashes of the string representations of the rows (i.e. Python lists). For example: import csv hashes = [] for row in csv.reader(open('old.csv','rb')): hashes.append( hash(str(row)) ) for row in csv.reader(open('new.csv','rb')): if hash(str(row)) not in hashes: print 'Not found' But this is failing miserably. I am constrained by artificially imposed memory limits that I cannot change, and thusly I went with the hashes instead of storing and comparing the lists directly. Some of the files I am comparing can be hundreds of megabytes in size. Any ideas for a way to accurately compress Python lists so that they can be compared in terms of simple equality to other lists? I.e. a hashing system that actually works? Bonus points: why didn't the above method work?

Read the article
Keep coding the wrong way to remain consistent? [closed]

- by bwalk2895

Possible Duplicate: Code maintenance: keeping a bad pattern when extending new code for being consistent, or not? To keep things simple let's say I am responsible for maintaining two applications, AwesomeApp and BadApp (I am responsible for more and no that is not their actual names). AwesomeApp is a greenfield project I have been working on with other members on my team. It was coded using all the fancy buzzwords, Multilayer, SOA, SOLID, TDD, and so on. It represents the direction we want to go as a team. BadApp is a application that has been around for a long time. The architecture suffers from many sins, namely everything is tightly coupled together and it is not uncommon to get a circular dependency error from the compiler, it is almost impossible to unit test, large classes, duplicate code, and so on. We have a plan to rewrite the application following the standards established by AwesomeApp, but that won't happen for a while. I have to go into BadApp and fix a bug, but after spending months coding what I consider correctly, I really don't want do continue perpetuate bad coding practices. However, the way AwesomeApp is coded is vastly different from the way BadApp is coded. I fear implementing the "correct" way would cause confusion for other developers who have to maintain the application. Question: Is it better to keep coding the wrong way to remain consistent with the rest of the code in the application (knowing it will be replaced) or is it better to code the right way with an understanding it could cause confusion because it is so much different? To give you an example. There is a large class (1000+ lines) with several functions. One of the functions is to calculate a date based on an enumerated value. Currently the function handles all the various calculations. The function relies on no other functionality within the class. It is self contained. I want to break the function into smaller functions (at the very least) and put them into their own classes and hide those classes behind an interface (at the most) and use the factory pattern to instantiate the date classes. If I just broke it out into smaller functions within the class it would follow the existing coding standard. The extra steps are to start following some of the SOLID principles.

Read the article
Is there a way to create a consistent snapshot/SnapMirror across multiple volumes?

- by Tomer Gabel

We use a NetApp FAS 6-series filer with an application that spans multiple volumes. For backup purposes I would like to create a consistent snapshot that spans these volumes at the same point in time (or at least with an extremely low delta); additionally, we'd like to to use SnapMirror to replace the production environment to test volumes. The problem is in creating a consistent snapshot/SnapMirror, since these commands are not transactional and do not take multiple parameters. I tried scripting consecutive "snap create" or "snapmirror resync" commands via SSH, but there's always a 0.5-2 second difference between each snapshot. It's currently "good enough", but I'm seriously concerned about the consistency impact with increased load (we're currently in pre-production). Has anyone managed to create a consistent snapshot that spans several volumes? How did you pull it off?

Read the article
How can I get access to password hashing in postgresql? Tried installing postgresql-contrib in ubun

- by Tchalvak

So I'm trying to just hash some passwords in postgresql, and the only hashing solution that I've found for postgresql is part of the pgcrytpo package ( http://www.postgresql.org/docs/8.3/static/pgcrypto.html ) that is supposed to be in postgresql-contrib ( http://www.postgresql.org/docs/8.3/static/contrib.html ). So I installed postgresql-contrib, (sudo apt-get install postgresql-contrib), restarted my server (as a simple way to restart postgresql). However, I still don't have access to any of the functions for hashing that are supposed to be in postgresql-contrib, e.g.: ninjawars=# select crypt('global salt' || 'new password' || 'user created date', gen_salt('sha256')); ERROR: function gen_salt(unknown) does not exist ninjawars=# select digest('test', 'sha256') from players limit 1; ERROR: function digest(unknown, unknown) does not exist ninjawars=# select hmac('test', 'sha256') from players limit 1; ERROR: function hmac(unknown, unknown) does not exist So how can I hash passwords in postgresql, on ubuntu?

Read the article
Code maintenance: keeping a bad pattern when extending new code for being consistent or not ?

- by Guillaume

I have to extend an existing module of a project. I don't like the way it has been done (lots of anti-pattern involved, like copy/pasted code). I don't want to perform a complete refactor. Should I: create new methods using existing convention, even if I feel it wrong, to avoid confusion for the next maintainer and being consistent with the code base? or try to use what I feel better even if it is introducing another pattern in the code ? Precison edited after first answers: The existing code is not a mess. It is easy to follow and understand. BUT it is introducing lots of boilerplate code that can be avoided with good design (resulting code might become harder to follow then). In my current case it's a good old JDBC (spring template inboard) DAO module, but I have already encounter this dilemma and I'm seeking for other dev feedback. I don't want to refactor because I don't have time. And even with time it will be hard to justify that a whole perfectly working module needs refactoring. Refactoring cost will be heavier than its benefits. Remember: code is not messy or over-complex. I can not extract few methods there and introduce an abstract class here. It is more a flaw in the design (result of extreme 'Keep It Stupid Simple' I think) So the question can also be asked like that: You, as developer, do you prefer to maintain easy stupid boring code OR to have some helpers that will do the stupid boring code at your place ? Downside of the last possibility being that you'll have to learn some stuff and maybe you will have to maintain the easy stupid boring code too until a full refactoring is done)

Read the article
HashSet vs. List performance

- by Michael Damatov

It's clear that a search performance of the generic HashSet<T> class is higher than of the generic List<T> class. Just compare the hash-based key with the linear approach in the List<T> class. However calculating a hash key may itself take some CPU cycles, so for a small amount of items the linear search can be a real alternative to the HashSet<T>. My question: where is the break-even? To simplify the scenario (and to be fair) let's assume that the List<T> class uses the element's Equals() method to identify an item.

Read the article
Computing MD5SUM of large files in C#

- by spkhaira

I am using following code to compute MD5SUM of a file - byte[] b = System.IO.File.ReadAllBytes(file); string sum = BitConverter.ToString(new MD5CryptoServiceProvider().ComputeHash(b)); This works fine normally, but if I encounter a large file (~1GB) - e.g. an iso image or a DVD VOB file - I get an Out of Memory exception. Though, I am able to compute the MD5SUM in cygwin for the same file in about 10secs. Please suggest how can I get this to work for big files in my program. Thanks

Read the article
Why do I get xfs_freeze "Operation not supported" error with ec2-consistent-snapshot? Debian Squeeze w/ext4 filesystem

- by Michael Endsley

I'm running the following command: [root@somehost ~]# ec2-consistent-snapshot --aws-credentials-file '/some/dir/file' --mysql --mysql-socket '/var/run/mysqld/mysql.sock' --mysql-username 'backup' --mysql-password 'password' --freeze-filesystem '/dev/xvda1' vol-xxxxxx It returns this error: xfs_freeze: cannot freeze filesystem at /dev/xvda1: Operation not supported ec2-consistent-snapshot: ERROR: xfs_freeze -f /dev/xvda1: failed(256) snap-eeb66393 xfs_freeze: cannot unfreeze filesystem mounted at /dev/xvda1: Invalid argument ec2-consistent-snapshot: ERROR: xfs_freeze -u /dev/xvda1: failed(256) This is being run on Debian Squeeze with the ext4 Linux filesystem. Can anyone explain this error to me, or what might be its cause? When googling, I found information about it needing to be executed with sudo, but I'm performing the entire operation as root. I also found some posts about trying to run it after a CentOS upgrade using yum, but the situation appeared dissimilar. It's difficult to find things referring to this situation exactly. xfs_freeze is available for use on the filesystem. Is it possible that the filesystem, despite being ext4, somehow doesn't support freezing? Sorry if I've missed some bit of StackExchange etiquette with this post -- it's my first venture here!

Read the article
SHA-1 and Unicode

- by Andrew

Hi everyone, Is behavior of SHA-1 algorithm defined for Unicode strings? I do realize that SHA-1 itself does not care about the content of the string, however, it seems to me that in order to pass standard tests for SHA-1, the input string should be encoded with UTF-8.

Read the article
Tracking unique versions of files with hashes

- by rwmnau

I'm going to be tracking different versions of potentially millions of different files, and my intent is to hash them to determine I've already seen that particular version of the file. Currently, I'm only using MD5 (the product is still in development, so it's never dealt with millions of files yet), which is clearly not long enough to avoid collisions. However, here's my question - Am I more likely to avoid collisions if I hash the file using two different methods and store both hashes (say, SHA1 and MD5), or if I pick a single, longer hash (like SHA256) and rely on that alone? I know option 1 has 288 hash bits and option 2 has only 256, but assume my two choices are the same total hash length. Since I'm dealing with potentially millions of files (and multiple versions of those files over time), I'd like to do what I can to avoid collisions. However, CPU time isn't (completely) free, so I'm interested in how the community feels about the tradeoff - is adding more bits to my hash proportionally more expensive to compute, and are there any advantages to multiple different hashes as opposed to a single, longer hash, given an equal number of bits in both solutions?

Read the article
PHP CRC32 length output

- by Industrial

Hi! Is there anything that can make the returned length of the PHP CRC32 function to vary? Thanks!

Read the article
Is SHA-1 secure for password storage?

- by Tgr

Some people throw around remarks like "SHA-1 is broken" a lot, so I'm trying to understand what exactly that means. Let's assume I have a database of SHA-1 password hashes, and an attacker whith a state of the art SHA-1 breaking algorithm and a botnet with 100,000 machines gets access to it. (Having control over 100k home computers would mean they can do about 10^15 operations per second.) How much time would they need to find out the password of any one user? find out the password of a given user? find out the password of all users? find a way to log in as one of the users? find a way to log in as a specific user? How does that change if the passwords are salted? Does the method of salting (prefix, postfix, both, or something more complicated like xor-ing) matter? Here is my current understanding, after some googling. Please correct in the answers if I misunderstood something. If there is no salt, a rainbow attack will immediately find all passwords (except extremely long ones). If there is a sufficiently long random salt, the most effective way to find out the passwords is a brute force or dictionary attack. Neither collision nor preimage attacks are any help in finding out the actual password, so cryptographic attacks against SHA-1 are no help here. It doesn't even matter much what algorithm is used - one could even use MD5 or MD4 and the passwords would be just as safe (there is a slight difference because computing a SHA-1 hash is slower). To evaluate how safe "just as safe" is, let's assume that a single sha1 run takes 1000 operations and passwords contain uppercase, lowercase and digits (that is, 60 characters). That means the attacker can test 1015*60*60*24 / 1000 ~= 1017 potential password a day. For a brute force attack, that would mean testing all passwords up to 9 characters in 3 hours, up to 10 characters in a week, up to 11 characters in a year. (It takes 60 times as much for every additional character.) A dictionary attack is much, much faster (even an attacker with a single computer could pull it off in hours), but only finds weak passwords. To log in as a user, the attacker does not need to find out the exact password; it is enough to find a string that results in the same hash. This is called a first preimage attack. As far as I could find, there are no preimage attacks against SHA-1. (A bruteforce attack would take 2160 operations, which means our theoretical attacker would need 1030 years to pull it off. Limits of theoretical possibility are around 260 operations, at which the attack would take a few years.) There are preimage attacks against reduced versions of SHA-1 with negligible effect (for the reduced SHA-1 which uses 44 steps instead of 80, attack time is down from 2160 operations to 2157). There are collision attacks against SHA-1 which are well within theoretical possibility (the best I found brings the time down from 280 to 252), but those are useless against password hashes, even without salting. In short, storing passwords with SHA-1 seems perfectly safe. Did I miss something?

Read the article
Salt question - using a "random salt"

- by barfoon

Hey everyone, Further to my question here, I have another question regarding salts. When someone says "use a random salt" to pre/append to a password, does this mean: Creating a static a 1 time randomly generated string of characters, or Creating a string of characters that changes at random every time a password is created? If the salt is random for every user and stored along with the hashed password, how is the original salt ever retrieved back for verification? Thanks!

Read the article
Cache SHA1 digest result?

- by johnathan

I'm storing several versions of a file based on a digest of the original filename and its version, like this: $filename = sha1($original . ':' . $version); Would it be worth it to cache the digest ($filename) in memcache as a key/value pair (the key being the original + version and value the sha1 hash), or is generating the digest quick enough (for a high traffic php web app)? Thanks, Johnathan

Read the article
Is it possible to get identical SHA1 hash?

- by drozzy

Given two different strings S1 and S2 (S1 != S2) is it possible that: SHA1(S1) == SHA1(S2) is True? If yes - with what probability? Is there a upper bound on the length of a string, for which probably of getting duplicates is 0? If not - why not? Thanks

Read the article
Can anybody explain the logic behind djb2 hash funcition.

- by de costo

Can anybody explain the logic behind the use of djb2 hash function as a good option for strings. The algorithm can be found at http://www.cse.yorku.ca/~oz/hash.html Why is it that 5381 and 33 hold such an importance in djb2 algorithm ??? Thanks, De Costo

Read the article
using Multi Probe LSH with LSHKIT

- by Yijinsei

Hi Guys, I have read through the source code for mplsh, but I still unsure on how to use the indexes generated by lshkit to speed up the process in comparing feature vector in Euclidean Distance. Do you guys have any experience regarding this?

Read the article
Good hash function for a 2d index

- by rlbond

I have a struct called Point. Point is pretty simple: struct Point { Row row; Column column; // some other code for addition and subtraction of points is there too } Row and Column are basically glorified ints, but I got sick of accidentally transposing the input arguments to functions and gave them each a wrapper class. Right now I use a set of points, but repeated lookups are really slowing things down. I want to switch to an unordered_set. So, I want to have an unordered_set of Points. Typically this set might contain, for example, every point on a 80x24 terminal = 1920 points. I need a good hash function. I just came up with the following: struct PointHash : public std::unary_function<Point, std::size_t> { result_type operator()(const argument_type& val) const { return val.row.value() * 1000 + val.col.value(); } }; However, I'm not sure that this is really a good hash function. I wanted something fast, since I need to do many lookups very quickly. Is there a better hash function I can use, or is this OK?

Read the article
Cycles/byte calculations

- by matskn

Hi ! In Crypto communities it is common to measure algorithm performance in cycles/byte. My question is, which parameters in the CPU architecture are affecting this number? Except the clockspeed ofcourse :)

Read the article
How does a hash table work?

- by Arec Barrwin

I'm looking for an explanation of how a hashtable works - in plain English for a simpleton like me! For example I know it takes the key, calculates the hash (how?) and then performs some kind of modulo to work out where it lies in the array that the value is stored, but that's where my knowledge stops. Could anyone clarify the process. Edit: I'm not looking specifically about how hashcodes are calculated, but a general overview of how a hashtable works.

Read the article
MAD method compression function

- by Jacques

I ran across the question below in an old exam. My answers just feels a bit short and inadequate. Any extra ideas I can look into or reasons I have overlooked would be great. Thanx Consider the MAD method compression function, mapping an object with hash code i to element [(3i + 7)mod9027]mod6000 of the 6000-element bucket array. Explain why this is a poor choice of compression function, and how it could be improved. I basically just say that the function could be improved by changing the value for p (or 9027) to an prime number and choosing an other constant for a (or 3) could also help.

Read the article
How to create a asp.net membership provider hashed password manually?

- by Anheledir

I'm using a website as a frontend and all users are authenticated with the standard ASP.NET Membership-Provider. Passwords are saved "hashed" within a SQL-Database. Now I want to write a desktop-client with administrative functions. Among other things there should be a method to reset a users password. I can access the database with the saved membership-data, but how can I manually create the password-salt and -hash? Using the System.Web.Membership Namespace seems to be inappropriate so I need to know how to create the salt and hash of the new password manually. Experts step up! :)

Read the article
checking crc32 of a file

- by agent154

This is not really a "how to" question. Is there a "standard" file structure that applications use to store the checksums of files in a folder? I'm developing a tool to check various things like crc32, md5, sha1, sha256, etc... I'd like to have my program store the various hashes in files in the folder of what I'm checking. I know that there is a file commonly used called 'md5sums' or 'sha1sums'. But what about CRC? I haven't noticed any around. And if there is, what's the structure of it? Thanks.

Read the article
How do I find hash value of a 3D vector ?

- by brainydexter

I am trying to perform broad-phase collision detection with a fixed-grid size approach. Thus, for each entity's position: (x,y,z) (each of type float), I need to find which cell does the entity lie in. I then intend to store all the cells in a hash-table and then iterate through to report (if any) collisions. So, here is what I am doing: Grid-cell's position: (int type) (Gx, Gy, Gz) = (x / M, y / M, z / M) where M is the size of the grid. Once, I have a cell, I'd like to add it to a hash-table with its key being a unique hash based on (Gx, Gy, Gz) and the value being the cell itself. Now, I cannot think of a good hash function and I need some help with that. Can someone please suggest me a good hash function? Thanks

Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >