Hadoop学习笔记 — Hadoop完全分布式搭建

By timebusker on March 15, 2018

Hadoop学习笔记 — Hadoop完全分布式快速搭建







# 编辑/etc/profile设置系统环境变量

# 注:在诸多文章中对Hadoop进行了大量环境变量设置,是冗余的
# 类似HADOOP_HOME之类的环境变量,在Hadoop启动脚本中程序会自动识别生成。
# 编辑core-site.xml

  <description>A base for other temporary directories.</description>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
  <description>Number of minutes after which the checkpoint
  gets deleted.  If zero, the trash feature is disabled.
  This option may be configured both on the server and the
  client. If trash is disabled server side then the client
  side configuration is checked. If trash is enabled on the
  server side then the value configured on the server is
  used and the client configuration value is ignored.
  <value>org.apache.hadoop.io.serializer.WritableSerialization, org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization, org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
  obtaining serializers and deserializers.</description>
# ----------------------------------------------------------------------------------------------------------------------------------
# 设置HDFS压缩与解码
# ----------------------------------------------------------------------------------------------------------------------------------
  <description>Controls whether to use native libraries for bz2 and zlib
    compression codecs or not. The property does not control any other native
  <description>The native-code library to be used for compression and
  decompression by the bzip2 codec.  This library could be specified
  either by by name or the full pathname.  In the former case, the
  library is located by the dynamic linker, usually searching the
  directories specified in the environment variable LD_LIBRARY_PATH.

  The value of "system-native" indicates that the default system
  library should be used.  To indicate that the algorithm should
  operate entirely in Java, specify "java-builtin".</description>
  <description>A comma-separated list of the compression codec classes that can
  be used for compression/decompression. In addition to any classes specified
  with this property (which take precedence), codec classes on the classpath
  are discovered using a Java ServiceLoader.</description>
# 编辑hdfs-site.xml

    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the mode,
    owner or group of files or directories.
  <description>The name of the group of super-users.</description>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
  <description>Permissions for the directories on on the local filesystem where
  the DFS data node store its blocks. The permissions can either be octal or
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  <description>Minimal block replication. 
    This property is used only if the value of
    dfs.client.block.write.replace-datanode-on-failure.enable is true.

    ALWAYS: always add a new datanode when an existing datanode is removed.
    NEVER: never add a new datanode.

      Let r be the replication number.
      Let n be the number of existing datanodes.
      Add a new datanode only if r is greater than or equal to 3 and either
      (1) floor(r/2) is greater than or equal to n; or
      (2) r is greater than n and the block is hflushed/appended.
      The default block size for new files, in bytes.
      You can use the following suffix (case insensitive):
      k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
      Or provide complete size in bytes (such as 134217728 for 128 MB).
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade
    <description>Maximum number of blocks per file, enforced by the Namenode on
        write. This prevents the creation of extremely large files which can
        degrade performance.</description>
  <description>The number of seconds between two periodic checkpoints.
  <description>Defines the maximum number of items that a directory may
      contain. Cannot set the property to a value less than 1 or more than
# 编辑mapred-site.xml

  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
# 编辑yarn-site.xml

    <description>A comma separated list of services where service name should only
      contain a-zA-Z0-9_ and can not start with numbers</description>





将以上配置好的Hadoop分发到各节点,在主节点上磁盘格式化:hadoop namenode -format

# 格式化namenode
# 主要三个作用:
# 创建一个全新的元数据存储目录
# 生成记录元数据的文件fsimage
# 生成记录元数据的文件fsimage
./bin/hdfs namenode -format

# 启动服务


# start-all.sh