Big Data Servers

Our lab has two servers for text analytics / big data work: cseebigd1 and cseebigd2. They are reasonably high performance Linux servers with 125 GB of RAM and 32 8-cores Xeon CPUs each. If you have problems or questions using them please contact Ans Alghamdi (email: adalgh).


Big Data Servers Q&A

Is there any guidlines for our servers?
Yes, this is a list I just made up now:

  • NO ONE should install lib/tool with root privileges (this includes su, su root, su - and sudo).
  • This also includes when you're building code from source; do not use sudo or make install. In most cases make should be enough if you setup your path correctly.
  • Shared libs/tools (installed with root privileges) should ONLY be limited to source building tools (e.g. gcc, make, python-setuptools, etc).
  • Libs/tools Port numbers should be kept to the default ones.
  • For "centralised services" like MySQL, everyone should create his own database. Also you should create a user to access your own db -do not use the admin/root user.

How can I install a software?
Ideally you would download source files and compile them in your own home directory. If you create a subdirectory in your home directory and name it bin it should be picked up by your path. So try to —prefix home when you run ./configure.

I need to run a command/tool with sudo what shall I do?
In case you have any command/tool that would not run without sudo its normally either:
a- should not be used in the first place,
b- there is a tool/lib that's not installed properly, or
c- it needs more privileges which we might not consider giving at the moment.

How can I install and use Weka?
You can run weka from the Terminal (command line). What you need to do is:

wget http://prdownloads.sourceforge.net/weka/weka-3-6-10.zip
unzip weka-3-6-10.zip
cd weka-3-6-10

And you are done!. To run it you can do for example:
java -cp ./weka.jar weka.classifiers.trees.J48 -t data/weather.nominal.arff  -i

This page will give you a hint for available commands. The only difference from that page is that you need to add an explicit classpath for every command -by adding "-cp ./weka.jar” after “java” in your command line just like our previous example.

How can I install IRSTLM?

cd
mkdir src
mkdir lib
mkdir bin
export PATH=$HOME/lib:$PATH
cd src
wget http://downloads.sourceforge.net/project/irstlm/irstlm/irstlm-5.80/irstlm-5.80.03.tgz
tar xvzf irstlm-5.80.03.tgz
cd irstlm-5.80.03
sh regenerate-makefiles.sh --force
./configure --prefix=$HOME
make
make install

How to install MITLM?

cd
cd src
wget https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz
tar xvzf mitlm-0.4.1.tar.gz
cd  mitlm-0.4.1
autoconf
./configure --prefix=$HOME
make
make install

How can I tell Moses where to find Boost, KenLM or ORLM?
Before you compile Moses you can configure it to find whatever library it needs, for instance:

./configure --prefix=$HOME --with-boost=/path/to/boost

Please go to this link before you compile your copy.

How can I set my JAVA_HOME?
you can point it to either java6 or java7:

echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64' >> ~/.bashrc
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc

Then logout & in again (or run: exec $SHELL).

How can I install CRF++?

cd
cd src
wget https://crfpp.googlecode.com/files/CRF%2B%2B-0.58.tar.gz
tar -zxvf CRF++-0.58.tar.gz
cd CRF++-0.58
./configure --prefix=$HOME
make 
make install

How can I install CRFsuite?
You need to download and build liblbfgs first:

cd
cd src
wget https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz
tar -zxvf liblbfgs-1.10.tar.gz
cd liblbfgs-1.10
./configure --prefix=$HOME
make 
make install

Now the same for CRFsuite:

cd
cd src
wget https://github.com/downloads/chokkan/crfsuite/crfsuite-0.12.tar.gz
tar -zxvf crfsuite-0.12.tar.gz
cd crfsuite-0.12
./configure --prefix=$HOME
make 
make install

Is MySQL installed?
Yes, we already tweaked mysql to best fit most cases and performs better on our servers. Global variables we set:

key_buffer              = 32G
max_allowed_packet      = 1G
thread_stack            = 1M
thread_cache_size       = 20
max_connections        = 300
table_cache            = 150
query_cache_limit       = 10M
query_cache_size        = 160M

If there's something you think we should change feel free to drop Ans an email. However, to start using mysql please first create you own username and password, create a database and then grant your username privileges to that database you created only. This is the only way you can guarantee you don't mess with others data.

Can I install a specific version of Ruby?
Yes, I would recommend using rbenv.

Can I install a specific version of Python?
Yes, I would recommend using pyenv.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License