Our lab has two servers for text analytics / big data work: cseebigd1 and cseebigd2. They are reasonably high performance Linux servers with 125 GB of RAM and 32 8-cores Xeon CPUs each. If you have problems or questions using them please contact Ans Alghamdi (email: adalgh).
Big Data Servers Q&A
Is there any guidlines for our servers?
Yes, this is a list I just made up now:
- NO ONE should install lib/tool with root privileges (this includes su, su root, su - and sudo).
- This also includes when you're building code from source; do not use sudo or make install. In most cases make should be enough if you setup your path correctly.
- Shared libs/tools (installed with root privileges) should ONLY be limited to source building tools (e.g. gcc, make, python-setuptools, etc).
- Libs/tools Port numbers should be kept to the default ones.
- For "centralised services" like MySQL, everyone should create his own database. Also you should create a user to access your own db -do not use the admin/root user.
How can I install a software?
Ideally you would download source files and compile them in your own home directory. If you create a subdirectory in your home directory and name it bin it should be picked up by your path. So try to —prefix home when you run ./configure.
I need to run a command/tool with sudo what shall I do?
In case you have any command/tool that would not run without sudo its normally either:
a- should not be used in the first place,
b- there is a tool/lib that's not installed properly, or
c- it needs more privileges which we might not consider giving at the moment.
How can I install and use Weka?
You can run weka from the Terminal (command line). What you need to do is:
wget http://prdownloads.sourceforge.net/weka/weka-3-6-10.zip
unzip weka-3-6-10.zip
cd weka-3-6-10
And you are done!. To run it you can do for example:
java -cp ./weka.jar weka.classifiers.trees.J48 -t data/weather.nominal.arff -i
This page will give you a hint for available commands. The only difference from that page is that you need to add an explicit classpath for every command -by adding "-cp ./weka.jar” after “java” in your command line just like our previous example.
How can I install IRSTLM?
cd
mkdir src
mkdir lib
mkdir bin
export PATH=$HOME/lib:$PATH
cd src
wget http://downloads.sourceforge.net/project/irstlm/irstlm/irstlm-5.80/irstlm-5.80.03.tgz
tar xvzf irstlm-5.80.03.tgz
cd irstlm-5.80.03
sh regenerate-makefiles.sh --force
./configure --prefix=$HOME
make
make install
How to install MITLM?
cd
cd src
wget https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz
tar xvzf mitlm-0.4.1.tar.gz
cd mitlm-0.4.1
autoconf
./configure --prefix=$HOME
make
make install
How can I tell Moses where to find Boost, KenLM or ORLM?
Before you compile Moses you can configure it to find whatever library it needs, for instance:
./configure --prefix=$HOME --with-boost=/path/to/boost
Please go to this link before you compile your copy.
How can I set my JAVA_HOME?
you can point it to either java6 or java7:
echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64' >> ~/.bashrc
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc
Then logout & in again (or run: exec $SHELL).
How can I install CRF++?
cd
cd src
wget https://crfpp.googlecode.com/files/CRF%2B%2B-0.58.tar.gz
tar -zxvf CRF++-0.58.tar.gz
cd CRF++-0.58
./configure --prefix=$HOME
make
make install
How can I install CRFsuite?
You need to download and build liblbfgs first:
cd
cd src
wget https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz
tar -zxvf liblbfgs-1.10.tar.gz
cd liblbfgs-1.10
./configure --prefix=$HOME
make
make install
Now the same for CRFsuite:
cd
cd src
wget https://github.com/downloads/chokkan/crfsuite/crfsuite-0.12.tar.gz
tar -zxvf crfsuite-0.12.tar.gz
cd crfsuite-0.12
./configure --prefix=$HOME
make
make install
Is MySQL installed?
Yes, we already tweaked mysql to best fit most cases and performs better on our servers. Global variables we set:
key_buffer = 32G
max_allowed_packet = 1G
thread_stack = 1M
thread_cache_size = 20
max_connections = 300
table_open_cache = 150
query_cache_limit = 10M
query_cache_size = 160M
If there's something you think we should change feel free to drop Ans an email. However, to start using mysql please first create you own username and password, create a database and then grant your username privileges to that database you created only. This is the only way you can guarantee you don't mess with others data.
Can I install a specific version of Ruby?
Yes, I would recommend using rbenv.
Can I install a specific version of Python?
Yes, I would recommend using pyenv.
What libs/tools/packages are already installed?
build-essential
libz-dev
libreadline-devlibncursesw5-dev
libssl-dev
libgdbm-dev
libsqlite3-dev
libbz2-dev
liblzma-dev
tk-dev
libdb-dev
libncursesw5-dev
libreadline5-dev
libc6-dev
libsqlite3-dev
libpoppler-dev
python-pip
python-dev
python-twisted
flex
zip
unzip
p7zip-full
openjdk-7-jre-headless
openjdk-7-jdk
openjdk-6-jre-headless
opendjk-6-jdk
ant
libtool
gfortran
libboost-all-dev
automake
mysql-server
sysstat
gawk
maven
gradle
r-base
r-base-dev
smartmontools
gsmartcontrol