I Lessen Data: 2014

Sunday, October 19, 2014

Starting MongoDB on OS X

When you install MongoDB on a Mac with Homebrew, it spits out this important info:

To have launchd start mongodb at login:
ln -sfv /usr/local/opt/mongodb/*.plist ~/Library/LaunchAgents
Then to load mongodb now:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mongodb.plist
Or, if you don't want/need launchctl, you can just run:
mongod --config /usr/local/etc/mongod.conf

This is important because the MongoDB docs tell you to run it by "sudo service mongodb start", which doesn't work on macs. Macs use "launchctl" instead of "service" (and they work differently). The things that they launch are defined by plist files in ~/Library/LaunchAgents.

There used to be a thing called "brew services", but now if you use that, it will tell you that it's unsupported and will be removed soon, so I guess don't use it.

Or if you just want to start it for a little while now, you could just start it with "mongod", but that doesn't work because you have to tell it where the config file is, which tells mongo where the database should be. Of course, you might not know where the config file or the database are. That's what the last line (mongod --config ...) is for.

Another tip: MongoDB logs are stored by default in:
/var/log/mongodb/mongodb.log

Sunday, February 23, 2014

Making maps in python, step 1: installing dependencies

I want to plot some things on a map. I got one guide from here, that uses Basemap.

Basemap:
brew install geos
brew install gdal
brew install gfortran

pip install matplotlib
pip install numpy
pip install pandas
pip install shapely
pip install basemap --allow-external basemap --allow-unverified basemap
pip install scipy
pip install pysal
pip install Fiona
pip install descartes

Okay, geez, I had written a ton more in this Blogger window, it looked like it was saving my draft, but then it... somehow didn't? Jesus, nothing on computers ever works. All right, I'll try to recreate it:

on Ubuntu, I wanted to pip install as much as possible (because of virtualenv) so I did stuff like this:
sudo apt-get build-dep numpy scipy matplotlib
pip install numpy scipy matplotlib
I think I just used pip for pandas, shapely, pysal, descartes. Fiona required something else, like apt-getting gdal or something?

Kartograph:
Ubuntu: http://kartograph.org/docs/kartograph.py/install-ubuntu.html
OSX: pip install -r https://raw.github.com/kartograph/kartograph.py/master/requirements.txt worked, I think because I already had installed gdal via brew.

Cartopy:
Ubuntu: https://github.com/SciTools/cartopy/issues/46
OSX: https://github.com/SciTools/cartopy/issues/48

Monday, February 3, 2014

Python sentence segmentation, kind of quick and mostly legit

Sentence segmentation (splitting a big block of text into sentences) is not trivial. You can't just split on periods, for example, because you'll get tripped up on every Dr. and Ms. and etc. and so on! However, it's mostly solved and in libraries, so here's a quick way to do it in python.

NLTK is a pretty general-purpose natural language processing toolkit. You could install the whole thing via instructions on their website. But that will also install a lot of other NLP tools. Also, a lot of these tools can be trained, which makes them more accurate if you have training data, but more difficult to get started if you don't have such training data. To get a pre-trained model:

- download Punkt from NLTK Data (direct link to Punkt)
- unzip it and copy english.pickle into the same directory as your python file. This is the trained model, which has been serialized out to a file. (obviously, this assumes you're segmenting English text; if not, grab one of the other .pickle files.)
- in your python code, unpickle it like so:
import pickle
segmenter_file = open('english.pickle', 'r')
sentence_segmenter = pickle.Unpickler(segmenter_file).load()
- then call:
sentences = sentence_segmenter.tokenize(text)
(where "text" is a string containing all your text).