HDFS is a distributed file-system, part of the Apache Hadoop framework.
Paths are specified as remote:
or remote:path/to/dir
.
Here is an example of how to make a remote called remote
. First run:
rclone config
This will guide you through an interactive setup process:
No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
[skip]
XX / Hadoop distributed file system
\ "hdfs"
[skip]
Storage> hdfs
** See help for hdfs backend at: https://rclone.org/hdfs/ **
hadoop name node and port
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to host namenode at port 8020
\ "namenode:8020"
namenode> namenode.hadoop:8020
hadoop user name
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to hdfs as root
\ "root"
username> root
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
Configuration complete.
Options:
- type: hdfs
- namenode: namenode.hadoop:8020
- username: root
Keep this "remote" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
hadoop hdfs
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
This remote is called remote
and can now be used like this
See all the top level directories
rclone lsd remote:
List the contents of a directory
rclone ls remote:directory
Sync the remote directory
to /home/local/directory
, deleting any excess files.
rclone sync --interactive remote:directory /home/local/directory
You may start with a manual setup or use the docker image from the tests:
If you want to build the docker image
git clone https://github.com/rclone/rclone.git
cd rclone/fstest/testserver/images/test-hdfs
docker build --rm -t rclone/test-hdfs .
Or you can just use the latest one pushed
docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs
NB it need few seconds to startup.
For this docker image the remote needs to be configured like this:
[remote]
type = hdfs
namenode = 127.0.0.1:8020
username = root
You can stop this image with docker kill rclone-hdfs
(NB it does not use volumes, so all data
uploaded will be lost.)
Time accurate to 1 second is stored.
No checksums are implemented.
You can use the rclone about remote:
command which will display filesystem size and current usage.
In addition to the default restricted characters set the following characters are also replaced:
Character | Value | Replacement |
---|---|---|
: | 0x3A | : |
Invalid UTF-8 bytes will also be replaced.
Here are the Standard options specific to hdfs (Hadoop distributed file system).
Hadoop name nodes and ports.
E.g. "namenode-1:8020,namenode-2:8020,..." to connect to host namenodes at port 8020.
Properties:
Hadoop user name.
Properties:
Here are the Advanced options specific to hdfs (Hadoop distributed file system).
Kerberos service principal name for the namenode.
Enables KERBEROS authentication. Specifies the Service Principal Name (SERVICE/FQDN) for the namenode. E.g. "hdfs/namenode.hadoop.docker" for namenode running as service 'hdfs' with FQDN 'namenode.hadoop.docker'.
Properties:
Kerberos data transfer protection: authentication|integrity|privacy.
Specifies whether or not authentication, data signature integrity checks, and wire encryption are required when communicating with the datanodes. Possible values are 'authentication', 'integrity' and 'privacy'. Used only with KERBEROS enabled.
Properties:
The encoding for the backend.
See the encoding section in the overview for more info.
Properties:
Description of the remote.
Properties:
Move
or DirMove
.