Web Scrapping

Security Exchange Commission: web browsing emulation and natural language processing

In this tutorial, I explain how to emulate navigation on the SEC website, perform bulk downloads of forms (e.g. 10-K forms) and extract for each company a sentiment grade, based on basic natural language processing methods (dictionaries). The full Java package including two dictionaries of positive and negative tone words is available on my github. The SEC used to provide all forms on a FTP server, however, two years ago they stopped it and we now need this little workaround to perform bulk downloads.