Accessing historical data of
listed companies in the Nairobi Securities Exchange is an expensive affair.Other than buying the data here,
I have not come across any other way to access the data
without pay. It is for this reason that I devised a way to parse the provided
past data from the financial
times webpage.
The idea is to open the
webpage of lets say, Safaricom
Ltd, download the page and extract the data contained in that page using
the pythonlibrary, Beautiful Soup running
on Matlab .The cleaned data is presented as a table variable or, written in a format supported in Excel, like CSV.
Assumptions
You have an installed a
Matlab engine and python library, Beautiful
Soup. Read on how to install Beautiful Soup. In this method, Beautiful Soup is executed
from the Matlab interface and not in the python console. Matlab R2014b and
Beautiful Soup 4.4.5 are used in this demo. You have working knowledge
of Matlab, HTML and Python.
Url Format
First
specify the url to the webpage of the specific company data. The url is
separated into two parts.
The first part specifies the path to the
general historical data. The second part identifies the specific security
symbol.
Let say we want to download Safaricom data, the first part is 'http://markets.ft.com/data/equities/tearsheet/historical?s='. The second part
is the symbol 'SCOM:NAI'. The full url is http://markets.ft.com/data/equities/tearsheet/historical?s=SCOM:NAI. Using the input
interface, you can specify what symbol you wish to download directly from the
Matlab interface
Search For the Table
After downloading the webpage, we convert the page to a Beautiful Soup object.In the object we find where the data table is specifically in the HTML tags. The webpage has only one table. The table tag consists of table rows () and data() that format the data into a clear way. Each row in the table is extracted and the ResultSet converted to a Matlab cell array Before extracting the data, prepare the data containers that will store the data after parsing from the table.Each of the column: date, open, high, low , close and volume will be stored in a cell. Get Data From Each Row Having extracted the rows and prepared the data containers next is to search for the data stored in the tags within the rows. A for-loop that excludes the first row of the table, first row consists of the data headers, literates to get data from each row. Using the get_text() from beautiful soups, the data is parsed from each tag and stored in the respective data container. At this stage, the parsed data is assumed to be in the matlab workspace ready for the cleaning process. Since the data is taken from the tags as strings, the OHLC data is converted to a numeric vector. Data Cleaning The date and volume data comes in a form that is messed up. Lets start with volume, the figures merges the whole number with a two decimal places number that has an ‘m’ or ‘k’ at the end denoting million or thousand respectively. For instance, if today, 19,517,400 shares were traded, then the figure obtained is in this format: ‘19,517,40019.52m'. After cleaning the code removes the ‘19.52m’ part and remains with the 19,517,400 part. The same case applies to the dates. The raw format extracted is in this form: 'Wednesday, July 20, 2016Wed, Jul 20, 2016'. The code below obtains the cleaned data as 20-Jul-2016. The datetime variable is effective to use during any sort of analysis. The last stage is to combine the data vectors to a table. Alternatively the data can be written to a csv file. First convert the date column to dates that are Excel understands. Then create a matrix of all the data columns. Finally write the data using the csvwrite() function. The csv is saved in the current directory. This script is limited to download data that is available on the webpage only. Usually, it is the last 30 days of data for each symbol. However if the user wants all the available historical data, the process would be as follows: 1. Open the financial times web page manually on your browser. 2. Search for the companies’ historical data in the market data tab 3. On the historical data page, scrolls down the data table till you get a show more . 4. Keep clicking the show more tab till all the data you want is loaded on the web page. 5. Right click the webpage and save it in your device.
Parsed Locally-Saved HTML File
The script can parse data from the locally saved file in a similar way. The only change you make is specifying the file path and replacing the url to the online page with the file path. For instance: html_file =urlread('file:///Users/wachiranguni/Desktop/python/Safaricom Ltd, SCOM_ NAI historical prices - FT.com.csv'); Later the script will be packaged into an excel addin that will make it more useful to the wider audience. Note this script can be used to download all the NSE listed companies data and any other historical price data set on the financial times websiteUpdate: 2/07/2018
This method is no longer functional because ft.com no longers displays historical data on the web pages without subscribing. However, investing.com is an alternative site you can download free NSE data.
0 comments:
Post a comment