Thursday, March 13, 2014

Setup/Installation instructions for crawler - scrapy for scraping pages and Django for frontend

Requirement:

Ubuntu 12.04 LTS.
Python version: 2.7.3


Setup:

sudo apt-get install python-pip
sudo pip install virtualenvwrapper

copy the following to .bashrc:
"""
export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/projects
source /usr/local/bin/virtualenvwrapper.sh
"""

mkvirtualenv crawler
workon crawler

Dependent Packages Installation:
sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
sudo apt-get install python2.7-dev
sudo apt-get install python-scrapy
sudo apt-get install libffi-dev
pip install Scrapy

If the above packages are already available in global site packages you can use them by running following virtual env command, more info here:
toggleglobalsitepackages

mysql:
sudo apt-get install python-mysqldb
sudo apt-get install mysql-client-core-5.5
sudo apt-get install mysql-server-core-5.5
sudo apt-get install mysql-server
pip install SQLAlchemy

Django:
pip install Django==1.6.1

BeautifulSoup:
pip install beautifulsoup4

NodeJS for running javascript on terminal/commandline:
http://nodejs.org/download/

Set the environment variable:
Be careful about the ordering of paths, virtual env should be followed by system installed python
export PYTHONPATH="/home/vjonnak/.virtualenvs/crawler/local/lib/python2.7/site-packages:/usr/lib/python2.7/dist-packages"

Create the DB to support utf8 encoding:
create database news_db  DEFAULT CHARACTER SET utf8   DEFAULT COLLATE utf8_general_ci;





No comments:

Post a Comment