Search Results

Search found 77 results on 4 pages for 'deduplication'.

Page 1/4 | 1 2 3 4  | Next Page >

  • Block-level deduplication on Linux

    - by Benoît
    NetApp provides block-level deduplication (ASIS). Do you know any filesystem (even FUSE-based) on Linux (or OpenSolaris, *BSD) that provides the same functionnality ? (I'm not interested in false deduplication like hardlinks).

    Read the article

  • Removing duplicate images (deduplication) - calculating "overlap" of images

    - by jotango
    Hello, I have a ton of product images on our file system. Our code removes 100% identical images (or does not allow them to be uploaded). However our sellers often upload items pictures which are very similar, but not exactly. They could have more whitespace, a worse quality (compression), a different size etc. Is there any way I can calculate the degree of overlap between two images, to flag ones for deletion? Kind of like a Levenshtein distance between two images... Any pointers would be very cool. Thanks!

    Read the article

  • T-SQL Query Results Not as Expected Deduplication

    - by Yoda
    Hi Guys, I am attempting to get all records where and Id field exists more than once, trouble is my query is returning nothing and I have no idea as to why!? And this is the only method I know. Here is my code: select [Customer Number], [Corporate Customer Number], [Order Date], [Order Number], [Order No], [Order Line Status], [Payment Method] , [ProcessOrder], [Order Platform] from Temp_ICOSOrder group by [Customer Number], [Corporate Customer Number], [Order Date], [Order Number], [Order No], [Order Line Status], [Payment Method] , [ProcessOrder] , [Order Platform] having COUNT([Order Number]) > 1 Any help is much appriciated!

    Read the article

  • VMware ESXi 4 On-Disk Data Deduplication - possible and supported?

    - by hurikhan77
    Environment: We are running multiple web, database, and application servers which usually share a pretty common installation (gentoo linux) and similar configuration in VMware ESXi 4. The differences are usually only some installed features or differing component versions. To create a new server, I usually choose the most similar (by features) running server, rsync a copy of it into freshly mounted filesystems, run grub, reconfigure and reboot. Problem: Over time this duplicates many on-disk data blocks which probably sums up to several 10's of gigabytes. I suppose if I could use a base system as template with the actual machines based on top of that, only writing changed blocks to some sort of "diff image", performance should improve (increased cache hit rate) and storage efficiency should increase (deduplicated storage space). This would be similar to what ESXi already supports for RAM deduplication (page sharing). Question: Is there any way to easily do this on ESXi 4? I already share the portage tree via NFS but this would not work for the rootfs.

    Read the article

  • SQL SERVER – Iridium I/O – SQL Server Deduplication that Shrinks Databases and Improves Performance

    - by Pinal Dave
    Database performance is a common problem for SQL Server DBA’s.  It seems like we spend more time on performance than just about anything else.  In many cases, we use scripts or tools that point out performance bottlenecks but we don’t have any way to fix them.  For example, what do you do when you need to speed up a query that is already tuned as well as possible?  Or what do you do when you aren’t allowed to make changes for a database supporting a purchased application? Iridium I/O for SQL Server was originally built at Confio software (makers of Ignite) because DBA’s kept asking for a way to actually fix performance instead of just pointing out performance problems. The technology is certified by Microsoft and was so promising that it was spun out into a separate company that is now run by the Confio Founder/CEO and technology management team. Iridium uses deduplication technology to both shrink the databases as well as boost IO performance.  It is intriguing to see it work.  It will deduplicate a live database as it is running transactions.  You can watch the database get smaller while user queries are running. Iridium is a simple tool to use. After installing the software, you click an “Analyze” button which will spend a minute or two on each database and estimate both your storage and performance savings.  Next, you click an “Activate” button to turn on Iridium I/O for your selected databases.  You don’t need to reboot the operating system or restart the database during any part of the process. As part of my test, I also wanted to see if there would be an impact on my databases when Iridium was removed.  The ‘revert’ process (bringing the files back to their SQL Server native format) was executed by a simple click of a button, and completed while the databases were available for normal processing. I was impressed and enjoyed playing with the software and encourage all of you to try it out.  Here is the link to the website to download Iridium for free. . Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Mavericks permission issues with Windows Server deduplicated shares

    - by dmohlmaster
    We have a number of 10.9-10.9.3 - Mavericks - machines installed throughout our facility. Much of the user content is pulled from shares stored on our Windows Server 2012 fileservers with deduplication enabled. I have found that files newly written or unoptimized are able to be accessed without issue - read, written, modified, etc. Once the file gets optomized/deduplicated and Windows adds the P & L attributes - sparse and symlink - the Macs running Mavericks begin to have access issues. Once the files get deduplicated, users begin receiving read access errors when copying files (see error1 below). This happens when copying to folders within the current folder tree or copying somewhere to the local system. If you 'stop' the copy operation and retry a few more times, it may eventually work for the specific instance but fail again later. I am however, able to copy these files without issue via the terminal. Other systems running 10.7 do not experience the same issues and are able to access file server resources without issue. Many of the systems having issues are newer and thus not able to be downgraded to 10.8 or 10.7. I have tried finder replacements such as Pathfinder but the results are the same. I know this is at least similar to the issues many Mac users are already experiencing and posting about but I haven't seen it directly linked to deduplication and the attributes written by Windows server. Has anyone seen this issue? Have any solutions been found? Error 1: When copying files after the PL attributes have been set by deduplication. "One or more items can't be copied to "Foler" because you don't have permissions to read them. ******************************************' Via the system.log, I am also seeing the following error when accessing these deduplicated file shares. The reparse point tag listed below is "IO_REPARSE_TAG_DEDUP" Reported error: "smbfs_nget: filename.ext - unknown reparse point tag 0x80000013"

    Read the article

  • How to Eliminate Tape Backup and Off-site Storage Service?

    - by Daniel Lucas
    PLEASE READ UPDATE AT THE BOTTOM. THANKS! ;) Environment Info (all Windows): 2 sites 30 servers site #1 (3TB of backup data) 5 servers site #2 (1TB of backup data) MPLS backbone tunnel connecting site #1 and site #2 Current Backup Process: Online Backup (disk-to-disk) Site #1 has a server running Symantec Backup Exec 12.5 with four 1TB USB 2.0 disks. BE jobs for full backups run nightly on all servers in site #1 to these disks. Site #2 backs up to a central file server there using software they already had when we purchased them. A BE job pulls that data nightly to site #1 and stores them on said disks. Off-site Backup (tape) Connected to our backup server is a tape drive. BE backs up the external disks to tape once a week which gets picked up by our off-site storage company. Obviously we rotate two tape libraries, one is always here and one is always there. Requirements: Eliminate the need for tape and off-site storage service by doing disk-to-disk at each site and replicating site #1 to site #2 and vice versa. Software based solution as hardware options have been too pricey (ie, SonicWall, Arkeia). Agents for Exchange, SharePoint, and SQL. Some Ideas So Far: Storage DroboPro at each site with an initial 8TB of storage (these are expandable up to 16TB at present). I like these because they are rackmountable, allow disparate drives, and have iSCSI interfaces. They are relatively cheap too. Software Symantec Backup Exec 12.5 already has all the agents and licenses we need. I'd like to keep using it unless there is a better solution, similarly priced, that does everything BE does plus deduplication and replication. Server Because there is no more need for a SCSI adapter (for tape drive) we are going to virtualize our backup server as it is currently the only physical machine save for SQL boxes. Problems: When replicating between sites we want as little data as possible to go across the pipe. There is no deduplication or compression in what I have laid out here so far. The files being replicated are BE's virtual tape libraries from our disk-to-disk backup. Because of this each of those huge files will go across the wire every week because they change every day. And Finally, the Question: Is there any software out there that does deduplication, or at least compression, to handle just our site-to-site replication? Or, looking at our setup, is there any other solution that I am missing that might be cheaper, faster, better? Thanks. Sorry so long. UPDATE 2: I've set a bounty on this question to get it more attention. I'm looking for software that will handle replication of data between two sites using the least amount of data possible (either compression, deduplication, or some other method). Something similar to rsync would work but it needs to be native to Windows and not a port involving shenanigans to get up and running. Prefer a GUI based product and I don't mind shelling out a few bones if it works. Please, answers that meet the above criteria only. If you don't think one exists or if you think I'm being to restrictive keep it to yourself. If after seven days there is no answer at all, so be it. Thanks again everyone. UPDATE 2: I really appreciate everyone coming forward with suggestions. There is no way for me to try all of these before the bounty expires. For now I'm going to let this bounty run out and whoever has the most votes will get the 100 rep points. Thanks again!

    Read the article

  • De-duplicating backup tool on a block basis? [closed]

    - by SST
    I am looking for an (ideally free as in speech or beer) backup tool for Unix-like OS which can store deduplicated backups, i.e. only nonredundant content takes up additional space. I already looked at dirvish (my first candidate) and rsnapshot which use hardlinks to achieve deduplication on a per-file level. However, as I want to back up large files (Thunderbird mailboxes 3GB, VMware images 10GB), such file are stored again entirely even if just a few bytes change. Then there are rsync-based tools like rdiff-backup which only store deltas and a current mirror. However, as the deltas are generated against each previous mirror, it is difficult to fine-tune the retention granularity (only keep one backup after a week, etc.) because the deltas would have to be re-evaluated. Another approach is to partition content into blocks and store each block only if it is not stored yet, otherwise just linking it to the first occurrence. The only tool I know of that does this by now is obnam (http://liw.fi/obnam), and it even supports zlib-compression and gpg-encryption -- nice! But it is very slow, AFAICT. Does any one know any other, solid backup software which supports deduplication on a sub-file level, ideally with at least some management options (show/select/delete generations...)?

    Read the article

1 2 3 4  | Next Page >