Stream tar.gz file from FTP server

Posted by linker on Server Fault See other posts from Server Fault or by linker
Published on 2012-06-26T19:41:03Z Indexed on 2012/06/26 21:17 UTC
Read the original article Hit count: 161

Filed under:
|
|

Here is the situation: I have a tar.gz file on a FTP server which can contain an arbitrary number of files.

Now what I'm trying to accomplish is have this file streamed and uploaded to HDFS through a Hadoop job. The fact that it's Hadoop is not important, in the end what I need to do is write some shell script that would take this file form ftp with wget and write the output to a stream.

The reason why I really need to use streams is that there will be a large number of these files, and each file will be huge.

It's fairly easy to do if I have a gzipped file and I'm doing something like this:

wget -O - "ftp://${user}:${pass}@${host}/$file" | zcat

But I'm not even sure if this is possible for a tar.gz file, especially since there are mutliple files in the archive. I'm a bit confused on what direction to take for this, any help would be greatly appreciated.

© Server Fault or respective owner

Related posts about ftp

Related posts about shell-scripting