Our lab has two servers for text analytics / big data work: cseebigd1 and cseebigd2. They are reasonably high performance Linux servers with 125 GB of RAM and 32 8-cores Xeon CPUs each. If you have problems or questions using them please contact Ans Alghamdi (email: adalgh).
Big Data Servers Q&A
Is there any guidlines for our servers?
Yes, this is a list I just made up now:
- NO ONE should install lib/tool with root privileges (this includes su, su root, su - and sudo).
- This also includes when you're building code from source; do not use sudo or make install. In most cases make should be enough if you setup your path correctly.
- Shared libs/tools (installed with root privileges) should ONLY be limited to source building tools (e.g. gcc, make, python-setuptools, etc).
- Libs/tools Port numbers should be kept to the default ones.
- For "centralised services" like MySQL, everyone should create his own database. Also you should create a user to access your own db -do not use the admin/root user.
How can I install a software?
Ideally you would download source files and compile them in your own home directory. If you create a subdirectory in your home directory and name it bin it should be picked up by your path. So try to —prefix home when you run ./configure.
I need to run a command/tool with sudo what shall I do?
In case you have any command/tool that would not run without sudo its normally either:
a- should not be used in the first place,
b- there is a tool/lib that's not installed properly, or
c- it needs more privileges which we might not consider giving at the moment.
How can I install and use Weka?
You can run weka from the Terminal (command line). What you need to do is:
wget http://prdownloads.sourceforge.net/weka/weka-3-6-10.zip unzip weka-3-6-10.zip cd weka-3-6-10
And you are done!. To run it you can do for example:
java -cp ./weka.jar weka.classifiers.trees.J48 -t data/weather.nominal.arff -i
This page will give you a hint for available commands. The only difference from that page is that you need to add an explicit classpath for every command -by adding "-cp ./weka.jar” after “java” in your command line just like our previous example.
How can I install IRSTLM?
cd mkdir src mkdir lib mkdir bin export PATH=$HOME/lib:$PATH cd src wget http://downloads.sourceforge.net/project/irstlm/irstlm/irstlm-5.80/irstlm-5.80.03.tgz tar xvzf irstlm-5.80.03.tgz cd irstlm-5.80.03 sh regenerate-makefiles.sh --force ./configure --prefix=$HOME make make install
How to install MITLM?
cd cd src wget https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz tar xvzf mitlm-0.4.1.tar.gz cd mitlm-0.4.1 autoconf ./configure --prefix=$HOME make make install
How can I tell Moses where to find Boost, KenLM or ORLM?
Before you compile Moses you can configure it to find whatever library it needs, for instance:
./configure --prefix=$HOME --with-boost=/path/to/boost
Please go to this link before you compile your copy.
How can I set my JAVA_HOME?
you can point it to either java6 or java7:
echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64' >> ~/.bashrc echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc
Then logout & in again (or run: exec $SHELL).
How can I install CRF++?
cd cd src wget https://crfpp.googlecode.com/files/CRF%2B%2B-0.58.tar.gz tar -zxvf CRF++-0.58.tar.gz cd CRF++-0.58 ./configure --prefix=$HOME make make install
How can I install CRFsuite?
You need to download and build liblbfgs first:
cd cd src wget https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz tar -zxvf liblbfgs-1.10.tar.gz cd liblbfgs-1.10 ./configure --prefix=$HOME make make install
Now the same for CRFsuite:
cd cd src wget https://github.com/downloads/chokkan/crfsuite/crfsuite-0.12.tar.gz tar -zxvf crfsuite-0.12.tar.gz cd crfsuite-0.12 ./configure --prefix=$HOME make make install
Is MySQL installed?
Yes, we already tweaked mysql to best fit most cases and performs better on our servers. Global variables we set:
key_buffer = 32G max_allowed_packet = 1G thread_stack = 1M thread_cache_size = 20 max_connections = 300 table_cache = 150 query_cache_limit = 10M query_cache_size = 160M
If there's something you think we should change feel free to drop Ans an email. However, to start using mysql please first create you own username and password, create a database and then grant your username privileges to that database you created only. This is the only way you can guarantee you don't mess with others data.
Can I install a specific version of Ruby?
Yes, I would recommend using rbenv.
Can I install a specific version of Python?
Yes, I would recommend using pyenv.
What libs/tools/packages are already installed?
build-essential libz-dev libreadline-devlibncursesw5-dev libssl-dev libgdbm-dev libsqlite3-dev libbz2-dev liblzma-dev tk-dev libdb-dev libncursesw5-dev libreadline5-dev libc6-dev libsqlite3-dev libpoppler-dev python-pip python-dev python-twisted flex zip unzip p7zip-full openjdk-7-jre-headless openjdk-7-jdk openjdk-6-jre-headless opendjk-6-jdk ant libtool gfortran libboost-all-dev automake mysql-server sysstat gawk maven gradle r-base r-base-dev smartmontools gsmartcontrol