Cleaning, and Mobility Thoughts

Just a couple thoughts on moving from a mass of txt to csv:

From the data acquired from the Library of Congress catalog, often Beirut was listed as a secondary publication location.

This brings up a crucial question for me in this project, which is the extent to which books in the Arab world were edited somewhere, and printed and distributed elsewhere.  This seems to be particularly important from the 1950s onward as terms like “distribution,” “printing,” “translation,” and “publishing” show up in the names of businesses.  It does seem important that we collect such metadata about the book business.

Using the regular expression \.* Bayrūt \.* Beirut \.* Beyrouth I was able to reorder the cases where a printing took place in Lebanon but the publication was listed in Beirut.

However, we will want to include the other countries in cooperation with which specific publishers were active (or include this in the interviews.  Important countries thus far seem to be Iraq, Sudan, Tunisia, Morocco, Syria, Egypt, Kuwait, KSA, Libya, USA, Yemen).  That Beirut was publishing for the Arab world and then was linked to circuits of distribution in the rest of the world is central to our research project.

Scraping and grabbing

The project-based element of this course will be creating a set of spatio-temporal narratives about publishing and bookselling in Beirut.

One of the issues we have been facing with planning this course is how to grab large amounts of publishing data about Beirut to be able to transform this into a historical database about publishers, their locations, the subjects they published, the languages they published in and the dates of activity.

We have found some born-digital lists, some lists in books and we are working on geocoding them to provide students with a baseline.

First, I used webscraping techniques to pull down lists of publishers and bookstores currently in operation from the Yellow Pages.

Second, I wanted to automate the process for grabbing a list from the Library of Congress and NYPL.  The best I thought of for now was

https://catalog.loc.gov/vwebv/search?searchArg1=bayrut+beirut+beyrouth+%D8%A8%D9%8A%D8%B1%D9%88%D8%AA&argType1=any&searchCode1=KPUB&searchType=2&combine2=and&searchArg2=1946&argType2=all&searchCode2=KPUB&combine3=and&searchArg3=&argType3=all&searchCode3=GKEY&year=1515-2015&fromYear=&toYear=&location=all&place=all&type=all&language=all&recCount=100

This searches trilingually for the city Beirut and year in the KPUB info of the LOC catalog.

(The first bolded string is بيروت and the second bolded integer is the year of publication.)  As one might expect, the numbers grew slowing throughout the 50s 60s and 70s.  From 1975 to 1976, the number of holdings drop from 229 to 89.  The shoot back up to almost 300 in 1977.

Third, I am going to strip this data of all but its time, publisher and language, verifying that it actually contains just publications from Beirut.  Students who would like to study thematic evolution of publishing over the century will be able to mine this data.

More thought needs to be given to the what kind of restricted set of publications those coming from the LOC consist of.