Search Results

Search found 77 results on 4 pages for 'deduplication'.

Page 2/4 | < Previous Page | 1 2 3 4  | Next Page >

  • Single Instance Storage layers

    - by Moo
    Hi, I have a data storage requirement which is an excellent candidate for single instance storage and deduplication. Can anyone suggest any .Net compatible libraries or systems which handles SIS and deduplication, either with SQL Server as an actual back end or its own high performance storage engine? What have peoples experiences been with such engines, and are there any pit falls to watch out for? Regards Moo

    Read the article

  • Experiences with eXdupe?

    - by ewwhite
    I noticed that the eXdupe compression/archiving/deduplication utility was recently mentioned in another post here. It boasts some interesting features, and I've been playing with it for the past day. It's basically a cross-platform, highly multithreaded archival tool. http://exdupe.com/index.html I'm curious if anyone here uses it in production or has any tips on how to leverage the tool in their environment. I'm looking for suggestions.

    Read the article

  • Access MP3 audio data independently of ID3 tags?

    - by kyl191
    Hi, this is a 2 part question. First off, is it possible to access the audio data in an MP3 independently of the ID3 tags, and secondly, is there any way to do so using available libraries? I recently consolidated my music collection from 3 computers and ended up with songs which had changed ID3 tags, but the audio data itself was unmodified. Running a search for duplicate files failed because the file changed with the ID3 tag change, but I think it should be possible to identify duplicate files if I just run a deduplication using the audio data for comparison. I know that it's possible to seek to a particular position past the ID3 header in the file, and directly read the data, but was wondering if there's a library that would expose the audio data so I could just extract the data, run a checksum on it, and store the computed result somewhere, then look for identical checksums. (Also, I'd probably have to use some kind of library when you take into account variable length headers.)

    Read the article

  • HP D2D 4312 Bacula configuration

    - by krisdigitx
    I have configured 5 libraries on the HP D2D system Discovery on the Bacula server shows only the last library and not all libraries. Why? [root@server bacula]# iscsiadm --mode discovery --type sendtargets --portal 10.66.59.114 10.66.59.114:3260,1 iqn.1986-03.com.hp:storage.d2dbs.czj2020vvy.50014380075dca5e.library12.drive1 10.66.59.114:3260,1 iqn.1986-03.com.hp:storage.d2dbs.czj2020vvy.50014380075dcaf2.library12.robotics I can query it fine using... [root@server bacula]# mtx -f /dev/sg2 inquiry Product Type: Tape Drive Vendor ID: 'HP ' Product ID: 'Ultrium 5-SCSI ' Revision: 'ED51' Attached Changer API: No [root@bray bacula]# mtx -f /dev/sg3 inquiry Product Type: Medium Changer Vendor ID: 'HP ' Product ID: 'MSL G3 Series ' Revision: 'EL41' Attached Changer API: No [root@server bacula]# mtx -f /dev/sg3 status Storage Changer /dev/sg3:1 Drives, 97 Slots ( 1 Import/Export ) Data Transfer Element 0:Empty Storage Element 1:Full :VolumeTag=50507F82 Storage Element 2:Full :VolumeTag=50507F83 Storage Element 3:Full :VolumeTag=50507F84 Storage Element 4:Full :VolumeTag=50507F85 Storage Element 5:Full :VolumeTag=50507F86 Storage Element 6:Full :VolumeTag=50507F87 Storage Element 7:Full :VolumeTag=50507F88 Does anyone have any good documentation for implementing Bacula with an HP D2D tape drive for server backups, and how to allocate libraries?

    Read the article

  • ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?

    - by ewwhite
    I'm using Nexentastor on a secondary storage server running on an HP ProLiant DL180 G6 with 12 Midline (7200 RPM) SAS drives. The system has an E5620 CPU and 8GB RAM. There is no ZIL or L2ARC device. Last week, I created a 750GB sparse zvol with dedup and compression enabled to share via iSCSI to a VMWare ESX host. I then created a Windows 2008 file server image and copied ~300GB of user data to the VM. Once happy with the system, I moved the virtual machine to an NFS store on the same pool. Once up and running with my VMs on the NFS datastore, I decided to remove the original 750GB zvol. Doing so stalled the system. Access to the Nexenta web interface and NMC halted. I was eventually able to get to a raw shell. Most OS operations were fine, but the system was hanging on the zfs destroy -r vol1/filesystem command. Ugly. I found the following two OpenSolaris bugzilla entries and now understand that the machine will be bricked for an unknown period of time. It's been 14 hours, so I need a plan to be able to regain access to the server. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924390 and http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=593704962bcbe0743d82aa339988?bug_id=6924824 In the future, I'll probably take the advice given in one of the buzilla workarounds: Workaround Do not use dedupe, and do not attempt to destroy zvols that had dedupe enabled. Update: I had to force the system to power off. Upon reboot, the system stalls at Importing zfs filesystems. It's been that way for 2 hours now.

    Read the article

  • ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?

    - by ewwhite
    I'm using Nexentastor on a secondary storage server running on an HP ProLiant DL180 G6 with 12 Midline (7200 RPM) SAS drives. The system has an E5620 CPU and 8GB RAM. There is no ZIL or L2ARC device. Last week, I created a 750GB sparse zvol with dedup and compression enabled to share via iSCSI to a VMWare ESX host. I then created a Windows 2008 file server image and copied ~300GB of user data to the VM. Once happy with the system, I moved the virtual machine to an NFS store on the same pool. Once up and running with my VMs on the NFS datastore, I decided to remove the original 750GB zvol. Doing so stalled the system. Access to the Nexenta web interface and NMC halted. I was eventually able to get to a raw shell. Most OS operations were fine, but the system was hanging on the zfs destroy -r vol1/filesystem command. Ugly. I found the following two OpenSolaris bugzilla entries and now understand that the machine will be bricked for an unknown period of time. It's been 14 hours, so I need a plan to be able to regain access to the server. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924390 and http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=593704962bcbe0743d82aa339988?bug_id=6924824 In the future, I'll probably take the advice given in one of the buzilla workarounds: Workaround Do not use dedupe, and do not attempt to destroy zvols that had dedupe enabled. Update: I had to force the system to power off. Upon reboot, the system stalls at Importing zfs filesystems. It's been that way for 2 hours now.

    Read the article

  • How to deduplicate 40TB of data?

    - by Michael Stauffer
    I've inherited a research cluster with ~40TB of data across three filesystems. The data stretches back almost 15 years, and there are most likely a good amount of duplicates as researchers copy each others data for different reasons and then just hang on to the copies. I know about de-duping tools like fdupes and rmlint. I'm trying to find one that will work on such a large dataset. I don't care if it takes weeks (or maybe even months) to crawl all the data - I'll probably throttle it anyway to go easy on the filesystems. But I need to find a tool that's either somehow super efficient with RAM, or can store all the intermediary data it needs in files rather than RAM. I'm assuming that my RAM (64GB) will be exhausted if I crawl through all this data as one set. I'm experimenting with fdupes now on a 900GB tree. It's 25% of the way through and RAM usage has been slowly creeping up the whole time, now it's at 700MB. Or, is there a way to direct a process to use disk-mapped RAM so there's much more available and it doesn't use system RAM? I'm running CentOS 6.

    Read the article

  • Proper use of disk to disk to tape backup using de-duplication and LTO5

    - by Michael
    I currently have ~12TB of data for a full disk to tape (LTO3) backup. Needless to say, it's now requiring over 16 tapes so I'm looking at other solutions. Here is what I've come up with. I'd like to hear the community's thoughts. Server for Disk-to-Disk BackupExec 2010 Using De-duplication Technology 20+TB worth of SATA drives LTO5 robotic library connected via SAS 1Gbps NIC connected to network What I envision is doing a full backup of my entire network which will initially take a long time over the 1Gbps NIC but once the de-duplication kicks in backups should be quick. I will then use the LTO5 to make disk to tape backups and archive those accordingly. What does everyone think? Any faster way of doing the initial full backup over the 1Gbps NIC? What will be my pain points? Is there a better way of doing what I'm trying to achieve?

    Read the article

  • Windows 7 file-based backup service

    - by Ben Voigt
    I'm looking for a good replacement for Lazy Mirror, since it doesn't support Windows 7 well. Pros: One of the things I really loved about Lazy Mirror is that it always maintains a "full" backup, but does so by only copying modified files. As each file was copied, the old version got archived (moved to an out-of-the-way location). So after mirroring ran, there'd be a complete copy of the file system, which could even be booted if necessary. At the same time, extra space on the backup media was used to store as many older versions of files as possible, without wasting space storing multiple copies of the same version. It seems that with Windows 7 backup, there'd be wasted space storing the same data in both the system image and file backup. It was completely file-based, but also aware of the registry (it had a feature to dump the live registry to hive files in the correct format). The backups were normal NTFS filesystems, no special tool was needed to read them. It automatically cleaned out the oldest previous versions when space ran out (unlike Windows 7 backup which apparently simply starts failing the the backup media fills.) It copied all file attributes including security. Cons: It doesn't deal well with junction points, symbolic links, and hard links. It didn't run as a service without lots of help from firesrv or srvany, and then you couldn't interact with the GUI. Running as a service was necessary to be able to mirror protected OS files. It didn't have open file handling, except for registry hives. I guess that the file-by-file archive and replacement could leave mismatched sets of files, if the mirror was interrupted. This would be the advantage of incremental backup techniques that require old full backup + all intermediate incremental backups to restore. But I don't see this as presenting much of a problem, you'd really only have a boot failure if you had a mixture of pre- and post-service pack files, and I can run a full image backup using another tool before applying a service pack. Does anyone know of a tool that does both full-system backup and storage of old versions of files like Lazy Mirror did (without storing the same data multiple times), and also can run as a service in Windows 7? Free is best of course, but a reasonably priced paid program (e.g. It would be absolutely awesome if it also triggered a backup/mirror pass when a particular external drive was plugged in and generated popup warnings if backups hadn't been run recently)

    Read the article

  • Duplicate file finder

    - by Andrija
    I need free duplicate file finder/remover app, with ability to find duplicate files/folders by name and/or by size and to remove one of duplicates. Can you please recommend any? and why? Thanks EDIT: Changed to CW. Please add more apps to list if you know any.

    Read the article

  • Program for remove exact duplicate files while caching search results

    - by John Thomas
    We need a Windows 7 program to remove/check the duplicates but our situation is somewhat different than the standard one for which there are enough programs. We have a fairly large static archive (collection) of photos spread on several disks. Let's call them Disk A..M. We have also some disks (let's call them Disk 1..9) which contain some duplicates which are to be found on disks A..M. We want to add to our collection new disks (N, O, P... aso.) which will contain the photos from disks 1..9 but, of course, we don't want to have any photos two (or more) times. Of course, theoretically, the task can be solved with a regular file duplicate remover but the time needed will be very big. Ideally, AFAIS now, the real solution would be a program which will scan the disks A..M, store the file sizes/hashes of the photos in an indexed database/file(s) and will check the new disks (1..9) against this database. However I have hard time to find such a program (if exists). Other things to note: we consider that the Disks A..M (the collection) doesn't have any duplicates on them the file names might be changed we aren't interested in approximated (fuzzy) comparison which can be found in some photo comparing programs. We hunt for exact duplicate files. we aren't afraid of command line. :-) we need to work on Win7/XP we prefer (of course) to be freeware TIA for any suggestions, John Th.

    Read the article

  • Is there a way to remove duplicate emails from a remote account?

    - by Mister IT Guru
    I have a user who has multiple duplicated emails across multiple folders on his IMAP account. How he managed to create them is beyond me, but mine is not to reason why, just to fix it! Can anyone recommend an application that I may use to remove the duplicates. (we're talking mailboxes in excess of 9G, and it's a remote server) I don't mind what OS I have to use to clean up the mailbox, I'm just looking for some recommendations. Thanks

    Read the article

  • How to de-dupe identical photos that have a slightly different file size?

    - by GJ.
    I imported many photos using the new "camera import" feature of Dropbox. Many of those were duplicates of photos previously imported by direct copying from the camera. Strangely, the Dropbox import appears to slightly reduce the file size. E.g. here on the right is the file imported through Dropbox: Comparison of the two files using pdiff returns "Images are binary identical", but tools such as fdupes or even the Picasa "show duplicate files" feature, consider them as unique. What can be the cause of this file size change? Is there any way to undo it? Most importantly: how can I de-dupe efficiently without regard to file size comparison? (running pdiff comparison over all photo pairs in my library is obviously impractical...) A solution for either OS X or Windows would do.

    Read the article

  • Feedback on available mid-to-enterprise level desktop backup solutions [closed]

    - by user85610
    I am involved in the creation of a new backup solution to replace our current Retrospect setup, which has become a significant time sink to administer. We have almost 200 desktop and some laptop clients, both Windows and OS X. We're only interested in products oriented around disk-to-disk, and would be integrate well with our current set of nine NAS devices as target storage. I'd just like some feedback from anyone out there, as it's sometimes difficult otherwise to find objective reviews of software at this level. Both data and time are important enough that we need a reliable solution which won't be prone to self-destruction as often as Retrospect. Bonus points for de-duplication, which might help squeeze more service time out of our NAS setup in terms of capacity. Currently considering Commvault and Netbackup. Many other products I've seen don't have an OSX client. Any thoughts?

    Read the article

  • Painless way of consolidating files across multiple machines/OSes

    - by 5arx
    Just bought a NAS. So I thought I'd get all our photos, media files and pdfs consolidated, de-duplicated, de-junked and virus-checked and stick them all on it. We have 3 laptops, one running Windows, the others OSX. We have a file server running Windows - it was the result of an earlier attempt at a networked fileserver - and a Mac Pro that is also kind of a server (previous attempts at this job have resulted in most of our stuff being on it). Also memory cards/sticks, cd backups and so on. I would be grateful if anyone could suggest a strategy or, ideally, tool(s) I could use to solve this problem. It is probably no more than one or two terabytes of data in total, but I can imagine that going through it all manually, file-by-file may well drive me insane.

    Read the article

  • Duplicate name exists solution

    - by user978733
    I have about 70 pc's with exactly same hardware. I decided to automate turning on and off. I took 1 PC. Here is what I've done: Changed bios configuration so that now pc's waking when I turn on AC switch Installed Windows XP and configured so that I can turn off remotelly, changed workgroup name to "WG1", and pc name to "ExamPC" Then created acronis backup image of this pc I installed this image in several PC's and tried to test All worked well till windows opened. The problem is, all tested PC's started Windows nearly at the same time, and all of them popped up error Duplicate name exist. I can't figure out any solution. Any suggestions?

    Read the article

  • Ways to deduplicate files

    - by User1
    I want to simply backup and archive the files on several machines. Unfortunately, the files have some large files that are the same file but stored differently on different machines. For instance, there may a few hundred photos that were copied from one computer to the other as an ad-hoc backup. Now that I want to make a common repository of files, I don't want several copies of the same photo. If I copy all of these files to a single directory, is there a tool that can go thru and recognize duplicate files and give me a list or even delete one of the duplicates?

    Read the article

  • Technical details for Server 2012 de-duplication feature

    - by syneticon-dj
    Now that Windows Server 2012 comes with de-duplication features for NTFS volumes I am having a hard time finding technical details about it. I can deduce from the TechNet documentation that the de-duplication action itself is an asynchronous process - not unlike how the SIS Groveler used to work - but there is virtually no detail about the implementation (algorithms used, resources needed, even the info on performance considerations is nothing but a bunch rule-of-thumb-style recommendations). Insights and pointers are greatly appreciated, a comparison to Solaris' ZFS de-duplication efficiency for a set of scenarios would be wonderful.

    Read the article

  • Is there such a thing as a file hosted container which deduplicates data held within?

    - by Mallow
    Background I have backups of a website which stores all of it's data into a single file. This file is several gigs large and I have many different backups of this file. Most of the data within is mostly the same plus whatever was added or changed to it. I want to keep all the concurrent backups I've made through the years in case I find a horrible surprise of data corruption along the line. However storing a 10gig file every month gets expensive. Seeking Solution I've often thought about different ways of alleviating this problem. One thought that comes up very often combines the idea of a duplicating file system which doesn't require it's own partitioned volume on a hard drive. Something like what truecrypt does, what it calls, "file hosted containers" which when using the truecrypt program allows you to mount and dismount that volume as a regular hard drive. Question Is there a virtual hard drive mounter which uses file-based container which uses data deduplicaiton file system? (This question is a little awkward to put into words, if you have a better idea on how to ask this question please feel free to help out.)

    Read the article

  • Find RARs with duplicate content

    - by Scott McClenning
    I need a utility to find RAR files that contain duplicate data (i.e. files within the RAR that hash the same, but could have different names). I can open the RARs and see the CRCs are the same, but I was hoping for a more automated process that would work in bulk (hundreds of files). Hashing the overall RAR won't help because the file contained within could have different names, or the archive could be compressed at different levels. If needed, a utility that would extract the contents of the RARs and then compare would work, but is not preferred. I would prefer a free utility for Windows, but a pay utility or a utility for Linux would be acceptable.

    Read the article

  • What's the best way to copy deduplicated files onto a new Server 2012 drive?

    - by Screndib
    We have a deduplicated volume on a Windows Server 2012 machine that is approaching it's limits. It is a 1.3TB drive with ~10TB of duplicated data. We want to copy all of this data onto a larger 4TB drive. What is the best way to perform this copy such that we only copy the 1.3TB of deduplicated data instead of unpacking the entire 10TB and repacking it on the other end? edit: I attempted a standard explorer file copy and a Copy-Item but neither appeared to be dedup-aware. I didn't run either to completion however so I can't say this is the case for sure.

    Read the article

  • Find copies of folders? (Not files)

    - by acidzombie24
    I have a dozen of folders that are duplicates. Within them are a few dozen folders that are duplicates so i have a few thousand copies of the same files and folders. Many of them are exactly the same while others have changes in a few files. What utility can i use to delete folders that are copies of others with no changes? if one or more files in that folder have been changed i dont want it deleted (and i'd like the subfolders to have a shortcuts to a copy but thats not required). Is there a utility to do this?

    Read the article

  • Are there free, low cost, or open source tools for matching name/address data?

    - by luiscolorado
    This question is related to Tools for matching name/address data. There is a number commercial tools provided by SAS, Oracle, Microsoft, etc., that allow to de-duplicate or merging names of individuals or companies coming from multiple sources. However, after reading the answers to the question mentioned before, I wondered why a seemingly interesting problem didn't receive any answers mentioning open source projects that could tackle the problem. Are you aware of any open source projects or algorithms to implement the so called "record linking", "record merging", or "clustering"?

    Read the article

  • Datastructure choices for highspeed and memory efficient detection of duplicate of strings

    - by Jonathan Holland
    I have a interesting problem that could be solved in a number of ways: I have a function that takes in a string. If this function has never seen this string before, it needs to perform some processing. If the function has seen the string before, it needs to skip processing. After a specified amount of time, the function should accept duplicate strings. This function may be called thousands of time per second, and the string data may be very large. This is a highly abstracted explanation of the real application, just trying to get down to the core concept for the purpose of the question. The function will need to store state in order to detect duplicates. It also will need to store an associated timestamp in order to expire duplicates. It does NOT need to store the strings, a unique hash of the string would be fine, providing there is no false positives due to collisions (Use a perfect hash?), and the hash function was performant enough. The naive implementation would be simply (in C#): Dictionary<String,DateTime> though in the interest of lowering memory footprint and potentially increasing performance I'm evaluating a custom data structures to handle this instead of a basic hashtable. So, given these constraints, what would you use? EDIT, some additional information that might change proposed implementations: 99% of the strings will not be duplicates. Almost all of the duplicates will arrive back to back, or nearly sequentially. In the real world, the function will be called from multiple worker threads, so state management will need to be synchronized.

    Read the article

  • Oracle Solaris 11 ZFS Lab for Openworld 2012

    - by user12626122
    Preface This is the content from the Oracle Openworld 2012 ZFS lab. It was well attended - the feedback was that it was a little short - thats probably because in writing it I bacame very time-concious after the ASM/ACFS on Solaris extravaganza I ran last year which was almost too long for mortal man to finish in the 1 hour session. Enjoy. Table of Contents Exercise Z.1: ZFS Pools Exercise Z.2: ZFS File Systems Exercise Z.3: ZFS Compression Exercise Z.4: ZFS Deduplication Exercise Z.5: ZFS Encryption Exercise Z.6: Solaris 11 Shadow Migration Introduction This set of exercises is designed to briefly demonstrate new features in Solaris 11 ZFS file system: Deduplication, Encryption and Shadow Migration. Also included is the creation of zpools and zfs file systems - the basic building blocks of the technology, and also Compression which is the compliment of Deduplication. The exercises are just introductions - you are referred to the ZFS Adminstration Manual for further information. From Solaris 11 onward the online manual pages consist of zpool(1M) and zfs(1M) with further feature-specific information in zfs_allow(1M), zfs_encrypt(1M) and zfs_share(1M). The lab is easily carried out in a VirtualBox running Solaris 11 with 6 virtual 3 Gb disks to play with. Exercise Z.1: ZFS Pools Task: You have several disks to use for your new file system. Create a new zpool and a file system within it. Lab: You will check the status of existing zpools, create your own pool and expand it. Your Solaris 11 installation already has a root ZFS pool. It contains the root file system. Check this: root@solaris:~# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT rpool 15.9G 6.62G 9.25G 41% 1.00x ONLINE - root@solaris:~# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 errors: No known data errors Note the disk device the root pool is on - c3t0d0s0 Now you will create your own ZFS pool. First you will check what disks are available: root@solaris:~# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 <ATA-VBOX HARDDISK-1.0 cyl 2085 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2829@d/disk@0,0 1. c3t2d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@2,0 2. c3t3d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@3,0 3. c3t4d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@4,0 4. c3t5d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@5,0 5. c3t6d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@6,0 6. c3t7d0 <ATA-VBOX HARDDISK-1.0 cyl 1534 alt 2 hd 128 sec 32> /pci@0,0/pci8086,2829@d/disk@7,0 Specify disk (enter its number): Specify disk (enter its number): The root disk is numbered 0. The others are free for use. Try creating a simple pool and observe the error message: root@solaris:~# zpool create mypool c3t2d0 c3t3d0 'mypool' successfully created, but with no redundancy; failure of one device will cause loss of the pool So destroy that pool and create a mirrored pool instead: root@solaris:~# zpool destroy mypool root@solaris:~# zpool create mypool mirror c3t2d0 c3t3d0 root@solaris:~# zpool status mypool pool: mypool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 errors: No known data errors Back to topExercise Z.2: ZFS File Systems Task: You have to create file systems for later exercises. You can see that when a pool is created, a file system of the same name is created: root@solaris:~# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 86.5K 2.94G 31K /mypool Create your filesystems and mountpoints as follows: root@solaris:~# zfs create -o mountpoint=/data1 mypool/mydata1 The -o option sets the mount point and automatically creates the necessary directory. root@solaris:~# zfs list mypool/mydata1 NAME USED AVAIL REFER MOUNTPOINT mypool/mydata1 31K 2.94G 31K /data1 Back to top Exercise Z.3: ZFS Compression Task:Try out different forms of compression available in ZFS Lab:Create 2nd filesystem with compression, fill both file systems with the same data, observe results You can see from the zfs(1) manual page that there are several types of compression available to you, set with the property=value syntax: compression=on | off | lzjb | gzip | gzip-N | zle Controls the compression algorithm used for this dataset. The lzjb compression algorithm is optimized for performance while providing decent data compression. Setting compression to on uses the lzjb compression algorithm. The gzip compression algorithm uses the same compression as the gzip(1) command. You can specify the gzip level by using the value gzip-N where N is an integer from 1 (fastest) to 9 (best compression ratio). Currently, gzip is equivalent to gzip-6 (which is also the default for gzip(1)). Create a second filesystem with compression turned on. Note how you set and get your values separately: root@solaris:~# zfs create -o mountpoint=/data2 mypool/mydata2 root@solaris:~# zfs set compression=gzip-9 mypool/mydata2 root@solaris:~# zfs get compression mypool/mydata1 NAME PROPERTY VALUE SOURCE mypool/mydata1 compression off default root@solaris:~# zfs get compression mypool/mydata2 NAME PROPERTY VALUE SOURCE mypool/mydata2 compression gzip-9 local Now you can copy the contents of /usr/lib into both your normal and compressing filesystem and observe the results. Don't forget the dot or period (".") in the find(1) command below: root@solaris:~# cd /usr/lib root@solaris:/usr/lib# find . -print | cpio -pdv /data1 root@solaris:/usr/lib# find . -print | cpio -pdv /data2 The copy into the compressing file system takes longer - as it has to perform the compression but the results show the effect: root@solaris:/usr/lib# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.35G 1.59G 31K /mypool mypool/mydata1 1.01G 1.59G 1.01G /data1 mypool/mydata2 341M 1.59G 341M /data2 Note that the available space in the pool is shared amongst the file systems. This behavior can be modified using quotas and reservations which are not covered in this lab but are covered extensively in the ZFS Administrators Guide. Back to top Exercise Z.4: ZFS Deduplication The deduplication property is used to remove redundant data from a ZFS file system. With the property enabled duplicate data blocks are removed synchronously. The result is that only unique data is stored and common componenents are shared. Task:See how to implement deduplication and its effects Lab: You will create a ZFS file system with deduplication turned on and see if it reduces the amount of physical storage needed when we again fill it with a copy of /usr/lib. root@solaris:/usr/lib# zfs destroy mypool/mydata2 root@solaris:/usr/lib# zfs set dedup=on mypool/mydata1 root@solaris:/usr/lib# rm -rf /data1/* root@solaris:/usr/lib# mkdir /data1/2nd-copy root@solaris:/usr/lib# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.02M 2.94G 31K /mypool mypool/mydata1 43K 2.94G 43K /data1 root@solaris:/usr/lib# find . -print | cpio -pd /data1 2142768 blocks root@solaris:/usr/lib# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.02G 1.99G 31K /mypool mypool/mydata1 1.01G 1.99G 1.01G /data1 root@solaris:/usr/lib# find . -print | cpio -pd /data1/2nd-copy 2142768 blocks root@solaris:/usr/lib#zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.99G 1.96G 31K /mypool mypool/mydata1 1.98G 1.96G 1.98G /data1 You could go on creating copies for quite a while...but you get the idea. Note that deduplication and compression can be combined: the compression acts on metadata. Deduplication works across file systems in a pool and there is a zpool-wide property dedupratio: root@solaris:/usr/lib# zpool get dedupratio mypool NAME PROPERTY VALUE SOURCE mypool dedupratio 4.30x - Deduplication can also be checked using "zpool list": root@solaris:/usr/lib# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT mypool 2.98G 1001M 2.01G 32% 4.30x ONLINE - rpool 15.9G 6.66G 9.21G 41% 1.00x ONLINE - Before moving on to the next topic, destroy that dataset and free up some space: root@solaris:~# zfs destroy mypool/mydata1 Back to top Exercise Z.5: ZFS Encryption Task: Encrypt sensitive data. Lab: Explore basic ZFS encryption. This lab only covers the basics of ZFS Encryption. In particular it does not cover various aspects of key management. Please see the ZFS Adminastrion Manual and the zfs_encrypt(1M) manual page for more detail on this functionality. Back to top root@solaris:~# zfs create -o encryption=on mypool/data2 Enter passphrase for 'mypool/data2': ******** Enter again: ******** root@solaris:~# Creation of a descendent dataset shows that encryption is inherited from the parent: root@solaris:~# zfs create mypool/data2/data3 root@solaris:~# zfs get -r encryption,keysource,keystatus,checksum mypool/data2 NAME PROPERTY VALUE SOURCE mypool/data2 encryption on local mypool/data2 keysource passphrase,prompt local mypool/data2 keystatus available - mypool/data2 checksum sha256-mac local mypool/data2/data3 encryption on inherited from mypool/data2 mypool/data2/data3 keysource passphrase,prompt inherited from mypool/data2 mypool/data2/data3 keystatus available - mypool/data2/data3 checksum sha256-mac inherited from mypool/data2 You will find the online manual page zfs_encrypt(1M) contains examples. In particular, if time permits during this lab session you may wish to explore the changing of a key using "zfs key -c mypool/data2". Exercise Z.6: Shadow Migration Shadow Migration allows you to migrate data from an old file system to a new file system while simultaneously allowing access and modification to the new file system during the process. You can use Shadow Migration to migrate a local or remote UFS or ZFS file system to a local file system. Task: You wish to migrate data from one file system (UFS, ZFS, VxFS) to ZFS while mainaining access to it. Lab: Create the infrastructure for shadow migration and transfer one file system into another. First create the file system you want to migrate root@solaris:~# zpool create oldstuff c3t4d0 root@solaris:~# zfs create oldstuff/forgotten Then populate it with some files: root@solaris:~# cd /var/adm root@solaris:/var/adm# find . -print | cpio -pdv /oldstuff/forgotten You need the shadow-migration package installed: root@solaris:~# pkg install shadow-migration Packages to install: 1 Create boot environment: No Create backup boot environment: No Services to change: 1 DOWNLOAD PKGS FILES XFER (MB) Completed 1/1 14/14 0.2/0.2 PHASE ACTIONS Install Phase 39/39 PHASE ITEMS Package State Update Phase 1/1 Image State Update Phase 2/2 You then enable the shadowd service: root@solaris:~# svcadm enable shadowd root@solaris:~# svcs shadowd STATE STIME FMRI online 7:16:09 svc:/system/filesystem/shadowd:default Set the filesystem to be migrated to read-only root@solaris:~# zfs set readonly=on oldstuff/forgotten Create a new zfs file system with the shadow property set to the file system to be migrated: root@solaris:~# zfs create -o shadow=file:///oldstuff/forgotten mypool/remembered Use the shadowstat(1M) command to see the progress of the migration: root@solaris:~# shadowstat EST BYTES BYTES ELAPSED DATASET XFRD LEFT ERRORS TIME mypool/remembered 92.5M - - 00:00:59 mypool/remembered 99.1M 302M - 00:01:09 mypool/remembered 109M 260M - 00:01:19 mypool/remembered 133M 304M - 00:01:29 mypool/remembered 149M 339M - 00:01:39 mypool/remembered 156M 86.4M - 00:01:49 mypool/remembered 156M 8E 29 (completed) Note that if you had created /mypool/remembered as encrypted, this would be the preferred method of encrypting existing data. Similarly for compressing or deduplicating existing data. The procedure for migrating a file system over NFS is similar - see the ZFS Administration manual. That concludes this lab session.

    Read the article

< Previous Page | 1 2 3 4  | Next Page >