pigwyl 发表于 2009-7-18 00:22:57

【原创】基于FreeBSD的分布式文件系统hadoop(手记)

前言,经过了将近一个星期的摸索,由于之前用过freebsd配置过apache+mysql+zend+php环境,对freebsd的操作还算熟悉,又从网上搜集了一些资料进行配置,目前能正常运行,仅仅是试用,暂时还没有投入系统中用,正在解决api问题,下面步骤可能有错,希望高手指正~~
欢迎转帖,转帖请注明原作者,pigwyl QQ:86206221 email:pigwyl#gmail.com
非常感谢学校的网络中心的于老师
给我提供了配置的环境

一,准备
1.安装软件版本
freebsd: 7.2
jdk1.5
2.安装系统
用完全安装否则jdk有可能安装失败
SSH得安装与配置
首先ee编辑/etc/inetd.conf,去掉ssh前的#,保存退出
编辑/etc/rc.conf
最后加入:sshd_enable=\"yes\"即可
激活sshd服务:
#/etc/rc.d/sshd start
最后
ee /etc/ssh/sshd_config,
下面是我的配置文件:(/etc/ssh/sshd_config)
####################################################
Protocol 2
AllowGroups wheel#允许wheel组成员
IgnoreRhosts yes
IgnoreUserKnownHosts yes
PrintMotd yes
StrictModes no
RSAAuthentication yes
X11Forwarding no
PermitRootLogin yes #允许root登录
PermitEmptyPasswords yes #允许空密码登录
PasswordAuthentication yes # 设置是否使用口令验证。
MaxStartups 5

AuthorizedKeysFile.ssh/authorized_keys
##############################################
记得修改完配置文件后,重新启动sshd服务器(/etc/rc.d/sshd restart)即可。
添加一个用户
#sysinstall
选择【Configure】--【User Management】--【User】,只需member group 填wheel即可,其他按自己要求填
重启机器后即可用SecureCRT了。


二,os及环境配置
1,安装jdk1.5
首先下载
http://karakurty.info/ftp/pub/FreeBSD/ports/distfiles/jdk-1_5_0_16-fcs-bin-b02-jrl-28_may_2008.jar
http://karakurty.info/ftp/pub/FreeBSD/ports/distfiles/jdk-1_5_0_16-fcs-src-b02-jrl-28_may_2008.jar
http://karakurty.info/ftp/pub/FreeBSD/ports/distfiles/tzupdater-1_3_12-2009a.zip
http://karakurty.info/ftp/pub/FreeBSD/ports/distfiles/bsd-jdk15-patches-9.tar.bz2
http://www.freebsdfoundation.org/cgi-bin/download?download=diablo-caffe-freebsd7-i386-1.6.0_07-b02.tar.bz2

到usr/ports/distfiles/目录
#cd /usr/ports/java/jdk15
# make install clean
编译完成之后会
===>Cleaning for unzip-5.52_5
===>Cleaning for m4-1.4.12,1
===>Cleaning for zip-3.0
===>Cleaning for open-motif-2.2.3_6
===>Cleaning for gmake-3.81_3
===>Cleaning for libX11-1.2.1,1
===>Cleaning for libXext-1.0.5,1
===>Cleaning for libXi-1.2.1,1
===>Cleaning for libXmu-1.0.4,1
===>Cleaning for libXp-1.0.0,1
===>Cleaning for libXt-1.0.5_1
===>Cleaning for libXtst-1.0.3_1
===>Cleaning for pkg-config-0.23_1
===>Cleaning for desktop-file-utils-0.15_1
===>Cleaning for nspr-4.7
===>Cleaning for libiconv-1.11_1
===>Cleaning for glib-2.20.1
===>Cleaning for javavmwrapper-2.3.2
===>Cleaning for gio-fam-backend-2.20.1
===>Cleaning for libsigsegv-2.5
===>Cleaning for libXaw-1.0.5_1,1
===>Cleaning for xbitmaps-1.0.1
===>Cleaning for libtool-1.5.26
===>Cleaning for gettext-0.17_1
===>Cleaning for libxcb-1.2_1
===>Cleaning for xorg-macros-1.2.1
===>Cleaning for bigreqsproto-1.0.2
===>Cleaning for xcmiscproto-1.1.2
===>Cleaning for xextproto-7.0.5
===>Cleaning for xtrans-1.2.3
===>Cleaning for kbproto-1.0.3
===>Cleaning for inputproto-1.5.0
===>Cleaning for xf86bigfontproto-1.1.2
===>Cleaning for libXau-1.0.4
===>Cleaning for libXdmcp-1.0.2_1
===>Cleaning for xproto-7.0.15
===>Cleaning for automake-1.10.1
===>Cleaning for autoconf-2.62
===>Cleaning for printproto-1.0.4
===>Cleaning for libSM-1.1.0_1,1
===>Cleaning for recordproto-1.13.2
===>Cleaning for perl-5.8.9_2
===>Cleaning for python25-2.5.4_1
===>Cleaning for pcre-7.9
===>Cleaning for gamin-0.1.10_1
===>Cleaning for libXpm-3.5.7
===>Cleaning for libcheck-0.9.6
===>Cleaning for libxslt-1.1.24_2
===>Cleaning for xcb-proto-1.4
===>Cleaning for libpthread-stubs-0.1
===>Cleaning for automake-wrapper-20071109
===>Cleaning for help2man-1.36.4_2
===>Cleaning for autoconf-wrapper-20071109
===>Cleaning for libICE-1.0.4_1,1
===>Cleaning for libxml2-2.7.3
===>Cleaning for p5-gettext-1.05_2
===>Cleaning for jdk-1.5.0.16p9_1,1

2,安装rsync,bash
a, #cd /usr/ports/net/rsync
#make install clean
b, #cd /usr/ports/shells/bash
#make install clean

3,系统配置
a, 本次配置使用服务器分别为
vc1(用在namenode由于本次测试只有2台机器所以兼datanode,但实际生产环境中不推荐兼datanode)
vc2(用在datanode)
b,配置服务器的hosts文件使namenode能够通过域名(vc1和vc2)访问到所有的datanode(如果namenode兼datanode也需要通过域名访问到自己),
所有的datanode也能通过域名访问到namenode
c,示例
==========================/etc/hosts===========================================
::1localhost localhost.rhinux.com
127.0.0.1localhost localhost.rhinux.com
200.0.0.003vc1.tj.com vc1
200.0.0.004vc2.tj.com vc2
======================(namenode & datanode)==================================

免密码ssh设置

如果不输入口令就无法用ssh登陆localhost,执行下面的命令:
1 生成密钥对,简便起见,在vc2上来生成,一下操作我用的都是hadoop
ssh-keygen -t rsa
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
c34:4d:24:64:c4:9d:2b:76:1a:6f:7ec:1a:60:23 root@fedora3
其中id_dsa是私钥,id_dsa.pub是公钥,私钥保留在客户端本地,公钥要复制到你要ssh登录的服务器上
2 配置公钥
在vc1上复制公钥到服务器vc2
scp ~/.ssh/id_dsa.pub vc2:/tmp
ssh a
cat /tmp/id_dsa.pub >> ~/.ssh/authorized_keys
rm -rf /tmp/id_rsa.pub



然后下载hadoop-0.18.3.tar.gz

解压到/usr/home/hadoop/目录

编辑/usr/home/hadoop/hadoop-0.18.3/conf/hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.5.0#更改java的路径

export HADOOP_HOME=/usr/home/pig/hadoop-0.18.3 #hadoop的路径

export HADOOP_NAMENODE_OPTS=\"-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS\"
export HADOOP_SECONDARYNAMENODE_OPTS=\"-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS\"
export HADOOP_DATANODE_OPTS=\"-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS\"
export HADOOP_BALANCER_OPTS=\"-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS\"
export HADOOP_JOBTRACKER_OPTS=\"-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS\"



回到/usr/home/hadoop/hadoop-0.18.3目录

#cd bin

#./start-all.sh

编辑

conf/master

localhost

conf/slaves

localhost

vc1

vc2



配置conf/hadoop-site.xml

<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>//你的namenode的配置,机器名加端口
<value>hdfs://200.0.0.003:54310/</value>
</property>
<property>
<name>mapred.job.tracker</name>//你的JobTracker的配置,机器名加端口
<value>hdfs://200.0.0.003:54311/</value>
</property>
<property>
<name>dfs.replication</name>//数据需要备份的数量,默认是三
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>//Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。
<value>/home/wenchu/hadoop/tmp/</value>
</property>
<property>
<name>mapred.child.java.opts</name>//java虚拟机的一些参数可以参照配置
<value>-Xmx512m</value>
</property>
<property>
<name>dfs.block.size</name>//block的大小,单位字节,后面会提到用处,必须是512的倍数,因为采用crc作文件完整性校验,默认配置512是checksum的最小单元。
<value>5120000</value>
<description>The default block size for new files.</description>
</property>
</configuration>

然后启动hadoop守护进程,

#cd /usr/hadoop/hadoop-0.18.3/bin

#./start-all.sh



然后查看一下是否正常运行

#./hadoop dfsadmin -report

tsw_tongxf 发表于 2009-7-18 00:52:54

我真不用UNIX..............

不过还是顶一个,不易。。。。。

pigwyl 发表于 2009-7-18 13:42:31

我弄了一个星期,前期配置是用虚拟机的,然后调的差不多的是用的三台服务器+一台dns服务器。
页: [1]
查看完整版本: 【原创】基于FreeBSD的分布式文件系统hadoop(手记)