{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example usage of `sgfixedincome_pkg` (Scraper)\n", "\n", "This Jupyter Notebook vignette shows how to use some of `sgfixedincome_pkg`'s bank fixed deposit website scraping functions. The functions shown here are primarily found in `scraper.py` and `consolidate.py`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting fixed deposit tables\n", "Many banks have websites with tables containing fixed deposit rates details. These tables are often formatted to look something like this (see DBS bank's page below):\n", "\n", "![DBS Website Screenshot](dbs_screenshot.png)\n", "\n", "We provide functions to extract such tables from the website and to convert them into a pandas dataframe:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Period$1,000 - $9,999$10,000 - $19,999$20,000 - $49,999$50,000 - $99,999$100,000 - $249,999$250,000 - $499,999$500,000 - $999,999
01 mth0.30000.30000.05000.05000.05000.05000.0500
12 mths0.50000.50000.05000.05000.05000.05000.0500
23 mths1.00001.00000.05000.05000.05000.05000.0500
34 mths1.40001.40000.05000.05000.05000.05000.0500
45 mths1.75001.75000.05000.05000.05000.05000.0500
56 mths2.15002.15000.05000.05000.05000.05000.0500
67 mths2.30002.30000.05000.05000.05000.05000.0500
78 mths2.35002.35000.05000.05000.05000.05000.0500
89 mths2.35002.35000.05000.05000.05000.05000.0500
910 mths2.40002.40000.05000.05000.05000.05000.0500
1011 mths2.40002.40000.05000.05000.05000.05000.0500
1112 mths2.45002.45000.05000.05000.05000.05000.0500
\n", "
" ], "text/plain": [ " Period $1,000 - $9,999 $10,000 - $19,999 $20,000 - $49,999 \\\n", "0 1 mth 0.3000 0.3000 0.0500 \n", "1 2 mths 0.5000 0.5000 0.0500 \n", "2 3 mths 1.0000 1.0000 0.0500 \n", "3 4 mths 1.4000 1.4000 0.0500 \n", "4 5 mths 1.7500 1.7500 0.0500 \n", "5 6 mths 2.1500 2.1500 0.0500 \n", "6 7 mths 2.3000 2.3000 0.0500 \n", "7 8 mths 2.3500 2.3500 0.0500 \n", "8 9 mths 2.3500 2.3500 0.0500 \n", "9 10 mths 2.4000 2.4000 0.0500 \n", "10 11 mths 2.4000 2.4000 0.0500 \n", "11 12 mths 2.4500 2.4500 0.0500 \n", "\n", " $50,000 - $99,999 $100,000 - $249,999 $250,000 - $499,999 \\\n", "0 0.0500 0.0500 0.0500 \n", "1 0.0500 0.0500 0.0500 \n", "2 0.0500 0.0500 0.0500 \n", "3 0.0500 0.0500 0.0500 \n", "4 0.0500 0.0500 0.0500 \n", "5 0.0500 0.0500 0.0500 \n", "6 0.0500 0.0500 0.0500 \n", "7 0.0500 0.0500 0.0500 \n", "8 0.0500 0.0500 0.0500 \n", "9 0.0500 0.0500 0.0500 \n", "10 0.0500 0.0500 0.0500 \n", "11 0.0500 0.0500 0.0500 \n", "\n", " $500,000 - $999,999 \n", "0 0.0500 \n", "1 0.0500 \n", "2 0.0500 \n", "3 0.0500 \n", "4 0.0500 \n", "5 0.0500 \n", "6 0.0500 \n", "7 0.0500 \n", "8 0.0500 \n", "9 0.0500 \n", "10 0.0500 \n", "11 0.0500 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sgfixedincome_pkg as sfi\n", "\n", "# Get website as parsed HTML BeautifulSoup object\n", "url = \"https://www.dbs.com.sg/personal/rates-online/fixed-deposit-rate-singapore-dollar.page\"\n", "soup = sfi.fetch_webpage(url)\n", "\n", "# Extract table element\n", "table_class = \"tbl-primary mBot-24\"\n", "tables = sfi.extract_table(soup, table_class)\n", "table = tables[0] # Since there is only 1 table in this website with that class\n", "\n", "# Convert table to pandas dataframe\n", "raw_df = sfi.table_to_df(table)\n", "raw_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshape raw tables\n", "\n", "While these tables are easy to interpret visually, it is hard to analyze the data in this format, especially once we consider multiple similar tables from multiple banks. The `reshape_table()` reformats this information. The result is a table with tenure (in months), rate, deposit lower bound, and deposit upper bound columns:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TenureRateDeposit lower boundDeposit upper bound
01.00.301000.09999.0
11.00.3010000.019999.0
21.00.0520000.049999.0
31.00.0550000.099999.0
41.00.05100000.0249999.0
...............
7912.00.0520000.049999.0
8012.00.0550000.099999.0
8112.00.05100000.0249999.0
8212.00.05250000.0499999.0
8312.00.05500000.0999999.0
\n", "

84 rows × 4 columns

\n", "
" ], "text/plain": [ " Tenure Rate Deposit lower bound Deposit upper bound\n", "0 1.0 0.30 1000.0 9999.0\n", "1 1.0 0.30 10000.0 19999.0\n", "2 1.0 0.05 20000.0 49999.0\n", "3 1.0 0.05 50000.0 99999.0\n", "4 1.0 0.05 100000.0 249999.0\n", ".. ... ... ... ...\n", "79 12.0 0.05 20000.0 49999.0\n", "80 12.0 0.05 50000.0 99999.0\n", "81 12.0 0.05 100000.0 249999.0\n", "82 12.0 0.05 250000.0 499999.0\n", "83 12.0 0.05 500000.0 999999.0\n", "\n", "[84 rows x 4 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sfi.reshape_table(raw_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entire single-website scrape in one function\n", "\n", "Instead of having to write multiple lines of code, use the `scrape_deposit_rates()` function to carry out the entire process above, and add some additional information to the final dataframe:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TenureRateDeposit lower boundDeposit upper boundRequired multiplesProduct providerProduct
01.00.301000.09999.0NoneDBSFixed Deposit
11.00.3010000.019999.0NoneDBSFixed Deposit
21.00.0520000.049999.0NoneDBSFixed Deposit
31.00.0550000.099999.0NoneDBSFixed Deposit
41.00.05100000.0249999.0NoneDBSFixed Deposit
........................
7912.00.0520000.049999.0NoneDBSFixed Deposit
8012.00.0550000.099999.0NoneDBSFixed Deposit
8112.00.05100000.0249999.0NoneDBSFixed Deposit
8212.00.05250000.0499999.0NoneDBSFixed Deposit
8312.00.05500000.0999999.0NoneDBSFixed Deposit
\n", "

84 rows × 7 columns

\n", "
" ], "text/plain": [ " Tenure Rate Deposit lower bound Deposit upper bound Required multiples \\\n", "0 1.0 0.30 1000.0 9999.0 None \n", "1 1.0 0.30 10000.0 19999.0 None \n", "2 1.0 0.05 20000.0 49999.0 None \n", "3 1.0 0.05 50000.0 99999.0 None \n", "4 1.0 0.05 100000.0 249999.0 None \n", ".. ... ... ... ... ... \n", "79 12.0 0.05 20000.0 49999.0 None \n", "80 12.0 0.05 50000.0 99999.0 None \n", "81 12.0 0.05 100000.0 249999.0 None \n", "82 12.0 0.05 250000.0 499999.0 None \n", "83 12.0 0.05 500000.0 999999.0 None \n", "\n", " Product provider Product \n", "0 DBS Fixed Deposit \n", "1 DBS Fixed Deposit \n", "2 DBS Fixed Deposit \n", "3 DBS Fixed Deposit \n", "4 DBS Fixed Deposit \n", ".. ... ... \n", "79 DBS Fixed Deposit \n", "80 DBS Fixed Deposit \n", "81 DBS Fixed Deposit \n", "82 DBS Fixed Deposit \n", "83 DBS Fixed Deposit \n", "\n", "[84 rows x 7 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = \"https://www.dbs.com.sg/personal/rates-online/fixed-deposit-rate-singapore-dollar.page\"\n", "table_class = \"tbl-primary mBot-24\"\n", "provider = \"DBS\" # Add info on bank name into dataframe\n", "sfi.scrape_deposit_rates(url, table_class, provider) # req_multiples defaults to None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scrape multiple bank websites\n", "\n", "Instead of scraping bank websites one by one, use `create_banks_df()` to scrape multiple bank websites and store data in a single dataframe. Information on failed scrapes are also provided (both the website that we failed to scrape from, and the error encountered). The function requires a list of tuples as input. \n", "\n", "Below, we scrape data from DBS and OCBC's website simultaneously, and attempt to scrape from UOB's website with a wrong table class. We also include an optional `req_multiples` input for OCBC in the hypothetical case that they require investments in multiples of \\$500:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Failed scraping processes: [{'product': 'UOB bank fixed deposit', 'error': 'Failed to scrape deposit rates for UOB: No tables found with the specified class.'}]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TenureRateDeposit lower boundDeposit upper boundRequired multiplesProduct providerProduct
01.00.301000.009999.0NoneDBSFixed Deposit
11.00.3010000.0019999.0NoneDBSFixed Deposit
21.00.0520000.0049999.0NoneDBSFixed Deposit
31.00.0550000.0099999.0NoneDBSFixed Deposit
41.00.05100000.00249999.0NoneDBSFixed Deposit
........................
20136.00.20100000.00249999.0500OCBCFixed Deposit
20236.00.20250000.00499999.0500OCBCFixed Deposit
20336.00.20500000.00999999.0500OCBCFixed Deposit
20448.03.005000.0020000.0500OCBCFixed Deposit
20548.00.2020000.0150000.0500OCBCFixed Deposit
\n", "

206 rows × 7 columns

\n", "
" ], "text/plain": [ " Tenure Rate Deposit lower bound Deposit upper bound \\\n", "0 1.0 0.30 1000.00 9999.0 \n", "1 1.0 0.30 10000.00 19999.0 \n", "2 1.0 0.05 20000.00 49999.0 \n", "3 1.0 0.05 50000.00 99999.0 \n", "4 1.0 0.05 100000.00 249999.0 \n", ".. ... ... ... ... \n", "201 36.0 0.20 100000.00 249999.0 \n", "202 36.0 0.20 250000.00 499999.0 \n", "203 36.0 0.20 500000.00 999999.0 \n", "204 48.0 3.00 5000.00 20000.0 \n", "205 48.0 0.20 20000.01 50000.0 \n", "\n", " Required multiples Product provider Product \n", "0 None DBS Fixed Deposit \n", "1 None DBS Fixed Deposit \n", "2 None DBS Fixed Deposit \n", "3 None DBS Fixed Deposit \n", "4 None DBS Fixed Deposit \n", ".. ... ... ... \n", "201 500 OCBC Fixed Deposit \n", "202 500 OCBC Fixed Deposit \n", "203 500 OCBC Fixed Deposit \n", "204 500 OCBC Fixed Deposit \n", "205 500 OCBC Fixed Deposit \n", "\n", "[206 rows x 7 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define inputs (a list of tuples)\n", "scrape_inputs=[\n", " (\n", " \"https://www.dbs.com.sg/personal/rates-online/fixed-deposit-rate-singapore-dollar.page\",\n", " \"tbl-primary mBot-24\",\n", " \"DBS\"\n", " ),\n", " (\n", " \"https://www.uob.com.sg/personal/online-rates/singapore-dollar-time-fixed-deposit-rates.page\",\n", " \"non_existent_table_class\",\n", " \"UOB\"\n", " ),\n", " (\n", " \"https://www.ocbc.com/personal-banking/deposits/fixed-deposit-sgd-interest-rates.page\",\n", " \"table__comparison-table\",\n", " \"OCBC\",\n", " 500\n", " )\n", "]\n", "\n", "combined_df, failed_providers = sfi.create_banks_df(scrape_inputs)\n", "print(f\"Failed scraping processes: {failed_providers}\")\n", "combined_df # Display final dataframe" ] } ], "metadata": { "kernelspec": { "display_name": "sgfixedincome_pkg", "language": "python", "name": "sgfixedincome_pkg" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 4 }