What the best approach to iterate and "store" files over a directory in C (Linux) ?

Posted by Andrei Ciobanu on Stack Overflow See other posts from Stack Overflow or by Andrei Ciobanu
Published on 2010-04-23T08:14:23Z Indexed on 2010/04/23 8:23 UTC
Read the original article Hit count: 195

Filed under:
|
|

I have written a function that checks if to files are duplicates or not. This function signature is:

int check_dup_memmap(char *f1_name, char *f2_name)

It returns:

  • (-1) - If something went wrong;
  • (0) - If the two files are similar;
  • (+1) - If the two files are different;

The next step is to write a function that iterates through all the files in a certain directory,apply the previous function, and gives a report on every existing duplicates.

Initially I've thought to write a function that generates a file with all the filenames in a certain directory and then, read that file again and gain and compare every two files. Here is that version of the function, that gets all the filenames in a certain directory.

void *build_dir_tree(char *dirname, FILE *f)
{
    DIR *cdir = NULL;
    struct dirent *ent = NULL;
    struct stat buf;
    if(f == NULL){
        fprintf(stderr, "NULL file submitted. [build_dir_tree].\n");
        exit(-1);   
    }
    if(dirname == NULL){
        fprintf(stderr, "NULL dirname submitted. [build_dir_tree].\n");
        exit(-1);
    }
    if((cdir = opendir(dirname)) == NULL){
        char emsg[MFILE_LEN];
        sprintf(emsg, "Cannot open dir: %s [build_dir_tree]\t",dirname);
        perror(emsg);
    }
    chdir(dirname);
    while ((ent = readdir(cdir)) != NULL) {
        lstat(ent->d_name, &buf);
        if (S_ISDIR(buf.st_mode)) {
            if (strcmp(".", ent->d_name) == 0 ||
                    strcmp("..", ent->d_name) == 0) {
                continue;
            }
            build_dir_tree(ent->d_name, f);
        }
        else{
            fprintf(f, "/%s/%s\n",util_get_cwd(),ent->d_name);
        }
    }
    chdir("..");
    closedir(cdir);
}

Still I consider this approach a little inefficient, as I have to parse the file again and again.

In your opinion what are other approaches should I follow:

  • Write a datastructure and hold the files instead of writing them in the file ? I think for a directory with a lot of files, the memory will become very fragmented.
  • Hold all the filenames in auto-expanding array, so that I can easy access every file by their index, because they will in a contiguous memory location.
  • Map this file in memory using mmap() ? But mmap may fail, as the file gets to big.

Any opinions on this. I want to choose the most efficient path, and access as few resources as possible. This is the requirement of the program...

EDIT: Is there a way to get the numbers of files in a certain directory, without iterating through it ?

© Stack Overflow or respective owner

Related posts about c

    Related posts about linux