# DeSAPher


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## The Problem

Working with SAP systems often means dealing with hundreds of tables
with cryptic names like MARA, VBAK, or MAKT. Understanding what these
tables contain and how they relate to each other traditionally requires
extensive documentation lookup or SAP expertise.

## Our Solution

We created a tool that automatically fetches and structures SAP table
definitions from online documentation. This gives us instant access
to: - Field names and their meanings - Data types and lengths -
Relationships between tables (via check tables) - Comprehensive
descriptions of each field - Table names and descriptions

## Benefits

- **Time Saving**: No more manual documentation lookups
- **Better Understanding**: Clear visibility of table structures and
  meanings
- **Easier Data Analysis**: Quick reference for field names and their
  purposes
- **Knowledge Sharing**: Makes SAP data structures more accessible to
  team members

## Future Directions

### 1. Interactive Interface

- Build a searchable interface for quick table/field lookups
- Implement full-text search across descriptions
- Add semantic search using LLMs to find relevant tables by describing
  needs in plain English

### 2. Visual Data Model

- Create interactive graph visualizations showing table relationships
- Highlight primary/foreign key connections
- Enable visual exploration of the SAP data model

### 3. AI-Powered Data Assistant

- Use LLMs with our structured documentation as context
- Generate SQL queries from natural language questions
- Suggest relevant tables for specific business questions
- Provide data model explanations in plain language

# Our plan

As data scientists working with ERP systems like SAP, we often encounter
large datasets with hundreds of cryptically named columns. To better
understand these data sources, we aim to web scrape SAP’s data
dictionary documentation to create programmatic access to table
definitions, column descriptions, and data types.

## Steps

1.  Scrape column description for one table (for example, MARA)
2.  Bring it in to a easily serachable format
3.  Apply process to all tables

### Scrape column description for one table (for example, MARA)

``` python
url = 'https://www.sapdatasheet.org/abap/tabl/mara.html'
```

``` python
response = httpx.get(url)
response
```

    ConnectTimeout: _ssl.c:993: The handshake operation timed out
    [31m---------------------------------------------------------------------------[39m
    [31mConnectTimeout[39m                            Traceback (most recent call last)
    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:101[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    100[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m101[39m     [38;5;28;01myield[39;00m
    [32m    102[39m [38;5;28;01mexcept[39;00m [38;5;167;01mException[39;00m [38;5;28;01mas[39;00m exc:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:250[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m--> [39m[32m250[39m     resp = [38;5;28;43mself[39;49m[43m.[49m[43m_pool[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mreq[49m[43m)[49m
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:256[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    255[39m     [38;5;28mself[39m._close_connections(closing)
    [32m--> [39m[32m256[39m     [38;5;28;01mraise[39;00m exc [38;5;28;01mfrom[39;00m[38;5;250m [39m[38;5;28;01mNone[39;00m
    [32m    258[39m [38;5;66;03m# Return the response. Note that in this case we still have to manage[39;00m
    [32m    259[39m [38;5;66;03m# the point at which the response is closed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:236[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    234[39m [38;5;28;01mtry[39;00m:
    [32m    235[39m     [38;5;66;03m# Send the request on the assigned connection.[39;00m
    [32m--> [39m[32m236[39m     response = [43mconnection[49m[43m.[49m[43mhandle_request[49m[43m([49m
    [32m    237[39m [43m        [49m[43mpool_request[49m[43m.[49m[43mrequest[49m
    [32m    238[39m [43m    [49m[43m)[49m
    [32m    239[39m [38;5;28;01mexcept[39;00m ConnectionNotAvailable:
    [32m    240[39m     [38;5;66;03m# In some cases a connection may initially be available to[39;00m
    [32m    241[39m     [38;5;66;03m# handle a request, but then become unavailable.[39;00m
    [32m    242[39m     [38;5;66;03m#[39;00m
    [32m    243[39m     [38;5;66;03m# In this case we clear the connection and try again.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py:101[39m, in [36mHTTPConnection.handle_request[39m[34m(self, request)[39m
    [32m    100[39m     [38;5;28mself[39m._connect_failed = [38;5;28;01mTrue[39;00m
    [32m--> [39m[32m101[39m     [38;5;28;01mraise[39;00m exc
    [32m    103[39m [38;5;28;01mreturn[39;00m [38;5;28mself[39m._connection.handle_request(request)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py:78[39m, in [36mHTTPConnection.handle_request[39m[34m(self, request)[39m
    [32m     77[39m [38;5;28;01mif[39;00m [38;5;28mself[39m._connection [38;5;129;01mis[39;00m [38;5;28;01mNone[39;00m:
    [32m---> [39m[32m78[39m     stream = [38;5;28;43mself[39;49m[43m.[49m[43m_connect[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m     80[39m     ssl_object = stream.get_extra_info([33m"[39m[33mssl_object[39m[33m"[39m)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py:156[39m, in [36mHTTPConnection._connect[39m[34m(self, request)[39m
    [32m    155[39m [38;5;28;01mwith[39;00m Trace([33m"[39m[33mstart_tls[39m[33m"[39m, logger, request, kwargs) [38;5;28;01mas[39;00m trace:
    [32m--> [39m[32m156[39m     stream = [43mstream[49m[43m.[49m[43mstart_tls[49m[43m([49m[43m*[49m[43m*[49m[43mkwargs[49m[43m)[49m
    [32m    157[39m     trace.return_value = stream

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py:154[39m, in [36mSyncStream.start_tls[39m[34m(self, ssl_context, server_hostname, timeout)[39m
    [32m    150[39m exc_map: ExceptionMapping = {
    [32m    151[39m     socket.timeout: ConnectTimeout,
    [32m    152[39m     [38;5;167;01mOSError[39;00m: ConnectError,
    [32m    153[39m }
    [32m--> [39m[32m154[39m [38;5;28;01mwith[39;00m map_exceptions(exc_map):
    [32m    155[39m     [38;5;28;01mtry[39;00m:

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py:14[39m, in [36mmap_exceptions[39m[34m(map)[39m
    [32m     13[39m     [38;5;28;01mif[39;00m [38;5;28misinstance[39m(exc, from_exc):
    [32m---> [39m[32m14[39m         [38;5;28;01mraise[39;00m to_exc(exc) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m
    [32m     15[39m [38;5;28;01mraise[39;00m

    [31mConnectTimeout[39m: _ssl.c:993: The handshake operation timed out

    The above exception was the direct cause of the following exception:

    [31mConnectTimeout[39m                            Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[53][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m response = httpx.get(url)
    [32m      2[39m response

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:195[39m, in [36mget[39m[34m(url, params, headers, cookies, auth, proxy, follow_redirects, verify, timeout, trust_env)[39m
    [32m    174[39m [38;5;28;01mdef[39;00m[38;5;250m [39m[34mget[39m(
    [32m    175[39m     url: URL | [38;5;28mstr[39m,
    [32m    176[39m     *,
    [32m   (...)[39m[32m    185[39m     trust_env: [38;5;28mbool[39m = [38;5;28;01mTrue[39;00m,
    [32m    186[39m ) -> Response:
    [32m    187[39m [38;5;250m    [39m[33;03m"""[39;00m
    [32m    188[39m [33;03m    Sends a `GET` request.[39;00m
    [32m    189[39m 
    [32m   (...)[39m[32m    193[39m [33;03m    on this function, as `GET` requests should not include a request body.[39;00m
    [32m    194[39m [33;03m    """[39;00m
    [32m--> [39m[32m195[39m     [38;5;28;01mreturn[39;00m [43mrequest[49m[43m([49m
    [32m    196[39m [43m        [49m[33;43m"[39;49m[33;43mGET[39;49m[33;43m"[39;49m[43m,[49m
    [32m    197[39m [43m        [49m[43murl[49m[43m,[49m
    [32m    198[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    199[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    200[39m [43m        [49m[43mcookies[49m[43m=[49m[43mcookies[49m[43m,[49m
    [32m    201[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    202[39m [43m        [49m[43mproxy[49m[43m=[49m[43mproxy[49m[43m,[49m
    [32m    203[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    204[39m [43m        [49m[43mverify[49m[43m=[49m[43mverify[49m[43m,[49m
    [32m    205[39m [43m        [49m[43mtimeout[49m[43m=[49m[43mtimeout[49m[43m,[49m
    [32m    206[39m [43m        [49m[43mtrust_env[49m[43m=[49m[43mtrust_env[49m[43m,[49m
    [32m    207[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:109[39m, in [36mrequest[39m[34m(method, url, params, content, data, files, json, headers, cookies, auth, proxy, timeout, follow_redirects, verify, trust_env)[39m
    [32m     57[39m [38;5;250m[39m[33;03m"""[39;00m
    [32m     58[39m [33;03mSends an HTTP request.[39;00m
    [32m     59[39m 
    [32m   (...)[39m[32m    100[39m [33;03m```[39;00m
    [32m    101[39m [33;03m"""[39;00m
    [32m    102[39m [38;5;28;01mwith[39;00m Client(
    [32m    103[39m     cookies=cookies,
    [32m    104[39m     proxy=proxy,
    [32m   (...)[39m[32m    107[39m     trust_env=trust_env,
    [32m    108[39m ) [38;5;28;01mas[39;00m client:
    [32m--> [39m[32m109[39m     [38;5;28;01mreturn[39;00m [43mclient[49m[43m.[49m[43mrequest[49m[43m([49m
    [32m    110[39m [43m        [49m[43mmethod[49m[43m=[49m[43mmethod[49m[43m,[49m
    [32m    111[39m [43m        [49m[43murl[49m[43m=[49m[43murl[49m[43m,[49m
    [32m    112[39m [43m        [49m[43mcontent[49m[43m=[49m[43mcontent[49m[43m,[49m
    [32m    113[39m [43m        [49m[43mdata[49m[43m=[49m[43mdata[49m[43m,[49m
    [32m    114[39m [43m        [49m[43mfiles[49m[43m=[49m[43mfiles[49m[43m,[49m
    [32m    115[39m [43m        [49m[43mjson[49m[43m=[49m[43mjson[49m[43m,[49m
    [32m    116[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    117[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    118[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    119[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    120[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:825[39m, in [36mClient.request[39m[34m(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)[39m
    [32m    810[39m     warnings.warn(message, [38;5;167;01mDeprecationWarning[39;00m, stacklevel=[32m2[39m)
    [32m    812[39m request = [38;5;28mself[39m.build_request(
    [32m    813[39m     method=method,
    [32m    814[39m     url=url,
    [32m   (...)[39m[32m    823[39m     extensions=extensions,
    [32m    824[39m )
    [32m--> [39m[32m825[39m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[43m.[49m[43msend[49m[43m([49m[43mrequest[49m[43m,[49m[43m [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m[43m [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:914[39m, in [36mClient.send[39m[34m(self, request, stream, auth, follow_redirects)[39m
    [32m    910[39m [38;5;28mself[39m._set_timeout(request)
    [32m    912[39m auth = [38;5;28mself[39m._build_request_auth(request, auth)
    [32m--> [39m[32m914[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_auth[49m[43m([49m
    [32m    915[39m [43m    [49m[43mrequest[49m[43m,[49m
    [32m    916[39m [43m    [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    917[39m [43m    [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    918[39m [43m    [49m[43mhistory[49m[43m=[49m[43m[[49m[43m][49m[43m,[49m
    [32m    919[39m [43m[49m[43m)[49m
    [32m    920[39m [38;5;28;01mtry[39;00m:
    [32m    921[39m     [38;5;28;01mif[39;00m [38;5;129;01mnot[39;00m stream:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:942[39m, in [36mClient._send_handling_auth[39m[34m(self, request, auth, follow_redirects, history)[39m
    [32m    939[39m request = [38;5;28mnext[39m(auth_flow)
    [32m    941[39m [38;5;28;01mwhile[39;00m [38;5;28;01mTrue[39;00m:
    [32m--> [39m[32m942[39m     response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_redirects[49m[43m([49m
    [32m    943[39m [43m        [49m[43mrequest[49m[43m,[49m
    [32m    944[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    945[39m [43m        [49m[43mhistory[49m[43m=[49m[43mhistory[49m[43m,[49m
    [32m    946[39m [43m    [49m[43m)[49m
    [32m    947[39m     [38;5;28;01mtry[39;00m:
    [32m    948[39m         [38;5;28;01mtry[39;00m:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:979[39m, in [36mClient._send_handling_redirects[39m[34m(self, request, follow_redirects, history)[39m
    [32m    976[39m [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mrequest[39m[33m"[39m]:
    [32m    977[39m     hook(request)
    [32m--> [39m[32m979[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_single_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m    980[39m [38;5;28;01mtry[39;00m:
    [32m    981[39m     [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mresponse[39m[33m"[39m]:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:1014[39m, in [36mClient._send_single_request[39m[34m(self, request)[39m
    [32m   1009[39m     [38;5;28;01mraise[39;00m [38;5;167;01mRuntimeError[39;00m(
    [32m   1010[39m         [33m"[39m[33mAttempted to send an async request with a sync Client instance.[39m[33m"[39m
    [32m   1011[39m     )
    [32m   1013[39m [38;5;28;01mwith[39;00m request_context(request=request):
    [32m-> [39m[32m1014[39m     response = [43mtransport[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m   1016[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(response.stream, SyncByteStream)
    [32m   1018[39m response.request = request

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:249[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    235[39m [38;5;28;01mimport[39;00m[38;5;250m [39m[34;01mhttpcore[39;00m
    [32m    237[39m req = httpcore.Request(
    [32m    238[39m     method=request.method,
    [32m    239[39m     url=httpcore.URL(
    [32m   (...)[39m[32m    247[39m     extensions=request.extensions,
    [32m    248[39m )
    [32m--> [39m[32m249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m    250[39m     resp = [38;5;28mself[39m._pool.handle_request(req)
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    156[39m     value = typ()
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m
    [32m    163[39m     [38;5;28;01mreturn[39;00m exc [38;5;129;01mis[39;00m [38;5;129;01mnot[39;00m value

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:118[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    115[39m     [38;5;28;01mraise[39;00m
    [32m    117[39m message = [38;5;28mstr[39m(exc)
    [32m--> [39m[32m118[39m [38;5;28;01mraise[39;00m mapped_exc(message) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m

    [31mConnectTimeout[39m: _ssl.c:993: The handshake operation timed out

``` python
sample_text = '''
<table class="table table-sm">
                                <caption class="text-right sapds-alv">
                                    <a href="/download/abap-tabl-component.php?format=csv&amp;tabname=MARA" title="Download components as CSV file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_wdvtxe.gif"></a> &nbsp;
                                    <a href="/download/abap-tabl-component.php?format=xls&amp;tabname=MARA" title="Download components as Excel 97-2003 Worksheet (.xls) file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_x__xls.gif"></a> &nbsp;
                                    <a href="/download/abap-tabl-component.php?format=xlsx&amp;tabname=MARA" title="Download components as Excel Open XML Format Spreadsheet (.xlsx) file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_lisvie.gif"></a> &nbsp;
                                </caption>
                                <thead>
                                    <tr>
                                        <th class="sapds-alv"> <img src="/abap/icon/s_b_pvre.gif"> </th>
                                        <th class="sapds-alv"> Field </th>
                                        <th class="sapds-alv"> Key </th>
                                        <th class="sapds-alv"> Data Element</th>
                                        <th class="sapds-alv"> Domain</th>
                                        <th class="sapds-alv"> Data<br>Type</th>
                                        <th class="sapds-alv"> Length</th>
                                        <th class="sapds-alv"> Decimal<br>Places</th>
                                        <th class="sapds-alv"> Short Description</th>
                                        <th class="sapds-alv"> Check<br>table</th>
                                    </tr>
                                </thead>
                                <tbody>
                                                                            <tr>
                                            <td class="sapds-alv"> <a id="FIELD_MANDT"></a> 1 </td>
                                            <td class="sapds-alv"> <img src="/abap/icon/s_struct.gif">                                                 <a href="/abap/tabl/mara-mandt.html" title="MANDT" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv text-center"> <input type="checkbox" name="field_MANDT" disabled="disabled" checked="checked"> </td>
                                            <td class="sapds-alv"> <a href="/abap/dtel/mandt.html" title="Client" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/mandt.html" title="Client (key field in client-specific tables)" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" title="Dictionary Data Type" target="_blank">CLNT</a> </td>
                                            <td class="sapds-alv text-right"> 3 &nbsp; </td>
                                            <td class="sapds-alv text-right"> 0 &nbsp; </td>
                                            <td class="sapds-alv"> Client </td>
                                            <td class="sapds-alv"> <a href="/abap/tabl/t000.html" title="Clients" target="_blank">T000</a> </td>
                                        </tr>
                                                                            <tr>
                                            <td class="sapds-alv"> <a id="FIELD_FASHGRD"></a> 239 </td>
                                            <td class="sapds-alv"> <img src="/abap/icon/s_struct.gif">                                                 <a href="/abap/tabl/mara-fashgrd.html" title="FASHGRD" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv text-center"> <input type="checkbox" name="field_FASHGRD" disabled="disabled"> </td>
                                            <td class="sapds-alv"> <a href="/abap/dtel/fashgrd.html" title="Fashion Grade" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/fashgrd.html" title="Fashion Grade" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" title="Dictionary Data Type" target="_blank">CHAR</a> </td>
                                            <td class="sapds-alv text-right"> 4 &nbsp; </td>
                                            <td class="sapds-alv text-right"> 0 &nbsp; </td>
                                            <td class="sapds-alv"> Fashion Grade </td>
                                            <td class="sapds-alv"> <a href="/abap/tabl/t6wfg.html" title="Degree of Fashion" target="_blank">T6WFG</a> </td>
                                        </tr>
                                                                    </tbody>
                            </table>'''
```

``` python
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('table', class_='table table-sm')
```

    NameError: name 'response' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[51][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m soup = BeautifulSoup(response.text, [33m'lxml'[39m)
    [32m      2[39m table = soup.find([33m'table'[39m, class_=[33m'table table-sm'[39m)

    [31mNameError[39m: name 'response' is not defined

### Bring it in to a easily searchable format

``` python
headers = [th.text.strip() for th in table.find('thead').find_all('th')]
headers
```

    NameError: name 'table' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[17][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m headers = [th.text.strip() [38;5;28;01mfor[39;00m th [38;5;28;01min[39;00m table.find([33m'thead'[39m).find_all([33m'th'[39m)]
    [32m      2[39m headers

    [31mNameError[39m: name 'table' is not defined

``` python
trs = [tr for tr in table.find('tbody').find_all('tr')]
trs[0]
```

    NameError: name 'table' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[18][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m trs = [tr [38;5;28;01mfor[39;00m tr [38;5;28;01min[39;00m table.find([33m'tbody'[39m).find_all([33m'tr'[39m)]
    [32m      2[39m trs[[32m0[39m]

    [31mNameError[39m: name 'table' is not defined

``` python
trs[1]
```

    NameError: name 'trs' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[19][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m trs[[32m1[39m]

    [31mNameError[39m: name 'trs' is not defined

``` python
[td.text.strip() for td in trs[1].find_all('td')]
```

    NameError: name 'trs' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[20][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m [td.text.strip() [38;5;28;01mfor[39;00m td [38;5;28;01min[39;00m trs[[32m1[39m].find_all([33m'td'[39m)]

    [31mNameError[39m: name 'trs' is not defined

``` python
rows = []
for tr in table.find('tbody').find_all('tr'):
    row = [td.text.strip() for td in tr.find_all('td')]
    rows.append(row)

rows[10]
```

    NameError: name 'table' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[21][39m[32m, line 2[39m
    [32m      1[39m rows = []
    [32m----> [39m[32m2[39m [38;5;28;01mfor[39;00m tr [38;5;28;01min[39;00m table.find([33m'tbody'[39m).find_all([33m'tr'[39m):
    [32m      3[39m     row = [td.text.strip() [38;5;28;01mfor[39;00m td [38;5;28;01min[39;00m tr.find_all([33m'td'[39m)]
    [32m      4[39m     rows.append(row)
    [32m      5[39m 

    [31mNameError[39m: name 'table' is not defined

``` python
df = pd.DataFrame(rows, columns=headers)
df.head()
```

    NameError: name 'headers' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[22][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df = pd.DataFrame(rows, columns=headers)
    [32m      2[39m df.head()

    [31mNameError[39m: name 'headers' is not defined

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L57"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_table_structure

``` python

def get_sap_table_structure(
    url
):

```

*Scrapes SAP table structure from sapdatasheet.org and returns a pandas
DataFrame*

``` python
df = get_sap_table_structure(url)
df.head()
```

    ReadTimeout: The read operation timed out
    [31m---------------------------------------------------------------------------[39m
    [31mReadTimeout[39m                               Traceback (most recent call last)
    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:101[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    100[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m101[39m     [38;5;28;01myield[39;00m
    [32m    102[39m [38;5;28;01mexcept[39;00m [38;5;167;01mException[39;00m [38;5;28;01mas[39;00m exc:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:250[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m--> [39m[32m250[39m     resp = [38;5;28;43mself[39;49m[43m.[49m[43m_pool[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mreq[49m[43m)[49m
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:256[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    255[39m     [38;5;28mself[39m._close_connections(closing)
    [32m--> [39m[32m256[39m     [38;5;28;01mraise[39;00m exc [38;5;28;01mfrom[39;00m[38;5;250m [39m[38;5;28;01mNone[39;00m
    [32m    258[39m [38;5;66;03m# Return the response. Note that in this case we still have to manage[39;00m
    [32m    259[39m [38;5;66;03m# the point at which the response is closed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:236[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    234[39m [38;5;28;01mtry[39;00m:
    [32m    235[39m     [38;5;66;03m# Send the request on the assigned connection.[39;00m
    [32m--> [39m[32m236[39m     response = [43mconnection[49m[43m.[49m[43mhandle_request[49m[43m([49m
    [32m    237[39m [43m        [49m[43mpool_request[49m[43m.[49m[43mrequest[49m
    [32m    238[39m [43m    [49m[43m)[49m
    [32m    239[39m [38;5;28;01mexcept[39;00m ConnectionNotAvailable:
    [32m    240[39m     [38;5;66;03m# In some cases a connection may initially be available to[39;00m
    [32m    241[39m     [38;5;66;03m# handle a request, but then become unavailable.[39;00m
    [32m    242[39m     [38;5;66;03m#[39;00m
    [32m    243[39m     [38;5;66;03m# In this case we clear the connection and try again.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py:103[39m, in [36mHTTPConnection.handle_request[39m[34m(self, request)[39m
    [32m    101[39m     [38;5;28;01mraise[39;00m exc
    [32m--> [39m[32m103[39m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[43m.[49m[43m_connection[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mrequest[49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:136[39m, in [36mHTTP11Connection.handle_request[39m[34m(self, request)[39m
    [32m    135[39m         [38;5;28mself[39m._response_closed()
    [32m--> [39m[32m136[39m [38;5;28;01mraise[39;00m exc

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:106[39m, in [36mHTTP11Connection.handle_request[39m[34m(self, request)[39m
    [32m     97[39m [38;5;28;01mwith[39;00m Trace(
    [32m     98[39m     [33m"[39m[33mreceive_response_headers[39m[33m"[39m, logger, request, kwargs
    [32m     99[39m ) [38;5;28;01mas[39;00m trace:
    [32m    100[39m     (
    [32m    101[39m         http_version,
    [32m    102[39m         status,
    [32m    103[39m         reason_phrase,
    [32m    104[39m         headers,
    [32m    105[39m         trailing_data,
    [32m--> [39m[32m106[39m     ) = [38;5;28;43mself[39;49m[43m.[49m[43m_receive_response_headers[49m[43m([49m[43m*[49m[43m*[49m[43mkwargs[49m[43m)[49m
    [32m    107[39m     trace.return_value = (
    [32m    108[39m         http_version,
    [32m    109[39m         status,
    [32m    110[39m         reason_phrase,
    [32m    111[39m         headers,
    [32m    112[39m     )

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:177[39m, in [36mHTTP11Connection._receive_response_headers[39m[34m(self, request)[39m
    [32m    176[39m [38;5;28;01mwhile[39;00m [38;5;28;01mTrue[39;00m:
    [32m--> [39m[32m177[39m     event = [38;5;28;43mself[39;49m[43m.[49m[43m_receive_event[49m[43m([49m[43mtimeout[49m[43m=[49m[43mtimeout[49m[43m)[49m
    [32m    178[39m     [38;5;28;01mif[39;00m [38;5;28misinstance[39m(event, h11.Response):

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:217[39m, in [36mHTTP11Connection._receive_event[39m[34m(self, timeout)[39m
    [32m    216[39m [38;5;28;01mif[39;00m event [38;5;129;01mis[39;00m h11.NEED_DATA:
    [32m--> [39m[32m217[39m     data = [38;5;28;43mself[39;49m[43m.[49m[43m_network_stream[49m[43m.[49m[43mread[49m[43m([49m
    [32m    218[39m [43m        [49m[38;5;28;43mself[39;49m[43m.[49m[43mREAD_NUM_BYTES[49m[43m,[49m[43m [49m[43mtimeout[49m[43m=[49m[43mtimeout[49m
    [32m    219[39m [43m    [49m[43m)[49m
    [32m    221[39m     [38;5;66;03m# If we feed this case through h11 we'll raise an exception like:[39;00m
    [32m    222[39m     [38;5;66;03m#[39;00m
    [32m    223[39m     [38;5;66;03m#     httpcore.RemoteProtocolError: can't handle event type[39;00m
    [32m   (...)[39m[32m    227[39m     [38;5;66;03m# perspective. Instead we handle this case distinctly and treat[39;00m
    [32m    228[39m     [38;5;66;03m# it as a ConnectError.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py:126[39m, in [36mSyncStream.read[39m[34m(self, max_bytes, timeout)[39m
    [32m    125[39m exc_map: ExceptionMapping = {socket.timeout: ReadTimeout, [38;5;167;01mOSError[39;00m: ReadError}
    [32m--> [39m[32m126[39m [38;5;28;01mwith[39;00m map_exceptions(exc_map):
    [32m    127[39m     [38;5;28mself[39m._sock.settimeout(timeout)

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py:14[39m, in [36mmap_exceptions[39m[34m(map)[39m
    [32m     13[39m     [38;5;28;01mif[39;00m [38;5;28misinstance[39m(exc, from_exc):
    [32m---> [39m[32m14[39m         [38;5;28;01mraise[39;00m to_exc(exc) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m
    [32m     15[39m [38;5;28;01mraise[39;00m

    [31mReadTimeout[39m: The read operation timed out

    The above exception was the direct cause of the following exception:

    [31mReadTimeout[39m                               Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[24][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df = get_sap_table_structure(url)
    [32m      2[39m df.head()

    [36mCell[39m[36m [39m[32mIn[23][39m[32m, line 9[39m, in [36mget_sap_table_structure[39m[34m(url)[39m
    [32m      5[39m     [38;5;28;01mimport[39;00m httpx
    [32m      6[39m     [38;5;28;01mfrom[39;00m bs4 [38;5;28;01mimport[39;00m BeautifulSoup
    [32m      7[39m     [38;5;28;01mimport[39;00m pandas [38;5;28;01mas[39;00m pd
    [32m      8[39m 
    [32m----> [39m[32m9[39m     response = httpx.get(url)
    [32m     10[39m     soup = BeautifulSoup(response.text, [33m'lxml'[39m)
    [32m     11[39m 
    [32m     12[39m     table = soup.find([33m'table'[39m, class_=[33m'table table-sm'[39m)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:195[39m, in [36mget[39m[34m(url, params, headers, cookies, auth, proxy, follow_redirects, verify, timeout, trust_env)[39m
    [32m    174[39m [38;5;28;01mdef[39;00m[38;5;250m [39m[34mget[39m(
    [32m    175[39m     url: URL | [38;5;28mstr[39m,
    [32m    176[39m     *,
    [32m   (...)[39m[32m    185[39m     trust_env: [38;5;28mbool[39m = [38;5;28;01mTrue[39;00m,
    [32m    186[39m ) -> Response:
    [32m    187[39m [38;5;250m    [39m[33;03m"""[39;00m
    [32m    188[39m [33;03m    Sends a `GET` request.[39;00m
    [32m    189[39m 
    [32m   (...)[39m[32m    193[39m [33;03m    on this function, as `GET` requests should not include a request body.[39;00m
    [32m    194[39m [33;03m    """[39;00m
    [32m--> [39m[32m195[39m     [38;5;28;01mreturn[39;00m [43mrequest[49m[43m([49m
    [32m    196[39m [43m        [49m[33;43m"[39;49m[33;43mGET[39;49m[33;43m"[39;49m[43m,[49m
    [32m    197[39m [43m        [49m[43murl[49m[43m,[49m
    [32m    198[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    199[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    200[39m [43m        [49m[43mcookies[49m[43m=[49m[43mcookies[49m[43m,[49m
    [32m    201[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    202[39m [43m        [49m[43mproxy[49m[43m=[49m[43mproxy[49m[43m,[49m
    [32m    203[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    204[39m [43m        [49m[43mverify[49m[43m=[49m[43mverify[49m[43m,[49m
    [32m    205[39m [43m        [49m[43mtimeout[49m[43m=[49m[43mtimeout[49m[43m,[49m
    [32m    206[39m [43m        [49m[43mtrust_env[49m[43m=[49m[43mtrust_env[49m[43m,[49m
    [32m    207[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:109[39m, in [36mrequest[39m[34m(method, url, params, content, data, files, json, headers, cookies, auth, proxy, timeout, follow_redirects, verify, trust_env)[39m
    [32m     57[39m [38;5;250m[39m[33;03m"""[39;00m
    [32m     58[39m [33;03mSends an HTTP request.[39;00m
    [32m     59[39m 
    [32m   (...)[39m[32m    100[39m [33;03m```[39;00m
    [32m    101[39m [33;03m"""[39;00m
    [32m    102[39m [38;5;28;01mwith[39;00m Client(
    [32m    103[39m     cookies=cookies,
    [32m    104[39m     proxy=proxy,
    [32m   (...)[39m[32m    107[39m     trust_env=trust_env,
    [32m    108[39m ) [38;5;28;01mas[39;00m client:
    [32m--> [39m[32m109[39m     [38;5;28;01mreturn[39;00m [43mclient[49m[43m.[49m[43mrequest[49m[43m([49m
    [32m    110[39m [43m        [49m[43mmethod[49m[43m=[49m[43mmethod[49m[43m,[49m
    [32m    111[39m [43m        [49m[43murl[49m[43m=[49m[43murl[49m[43m,[49m
    [32m    112[39m [43m        [49m[43mcontent[49m[43m=[49m[43mcontent[49m[43m,[49m
    [32m    113[39m [43m        [49m[43mdata[49m[43m=[49m[43mdata[49m[43m,[49m
    [32m    114[39m [43m        [49m[43mfiles[49m[43m=[49m[43mfiles[49m[43m,[49m
    [32m    115[39m [43m        [49m[43mjson[49m[43m=[49m[43mjson[49m[43m,[49m
    [32m    116[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    117[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    118[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    119[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    120[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:825[39m, in [36mClient.request[39m[34m(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)[39m
    [32m    810[39m     warnings.warn(message, [38;5;167;01mDeprecationWarning[39;00m, stacklevel=[32m2[39m)
    [32m    812[39m request = [38;5;28mself[39m.build_request(
    [32m    813[39m     method=method,
    [32m    814[39m     url=url,
    [32m   (...)[39m[32m    823[39m     extensions=extensions,
    [32m    824[39m )
    [32m--> [39m[32m825[39m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[43m.[49m[43msend[49m[43m([49m[43mrequest[49m[43m,[49m[43m [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m[43m [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:914[39m, in [36mClient.send[39m[34m(self, request, stream, auth, follow_redirects)[39m
    [32m    910[39m [38;5;28mself[39m._set_timeout(request)
    [32m    912[39m auth = [38;5;28mself[39m._build_request_auth(request, auth)
    [32m--> [39m[32m914[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_auth[49m[43m([49m
    [32m    915[39m [43m    [49m[43mrequest[49m[43m,[49m
    [32m    916[39m [43m    [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    917[39m [43m    [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    918[39m [43m    [49m[43mhistory[49m[43m=[49m[43m[[49m[43m][49m[43m,[49m
    [32m    919[39m [43m[49m[43m)[49m
    [32m    920[39m [38;5;28;01mtry[39;00m:
    [32m    921[39m     [38;5;28;01mif[39;00m [38;5;129;01mnot[39;00m stream:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:942[39m, in [36mClient._send_handling_auth[39m[34m(self, request, auth, follow_redirects, history)[39m
    [32m    939[39m request = [38;5;28mnext[39m(auth_flow)
    [32m    941[39m [38;5;28;01mwhile[39;00m [38;5;28;01mTrue[39;00m:
    [32m--> [39m[32m942[39m     response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_redirects[49m[43m([49m
    [32m    943[39m [43m        [49m[43mrequest[49m[43m,[49m
    [32m    944[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    945[39m [43m        [49m[43mhistory[49m[43m=[49m[43mhistory[49m[43m,[49m
    [32m    946[39m [43m    [49m[43m)[49m
    [32m    947[39m     [38;5;28;01mtry[39;00m:
    [32m    948[39m         [38;5;28;01mtry[39;00m:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:979[39m, in [36mClient._send_handling_redirects[39m[34m(self, request, follow_redirects, history)[39m
    [32m    976[39m [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mrequest[39m[33m"[39m]:
    [32m    977[39m     hook(request)
    [32m--> [39m[32m979[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_single_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m    980[39m [38;5;28;01mtry[39;00m:
    [32m    981[39m     [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mresponse[39m[33m"[39m]:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:1014[39m, in [36mClient._send_single_request[39m[34m(self, request)[39m
    [32m   1009[39m     [38;5;28;01mraise[39;00m [38;5;167;01mRuntimeError[39;00m(
    [32m   1010[39m         [33m"[39m[33mAttempted to send an async request with a sync Client instance.[39m[33m"[39m
    [32m   1011[39m     )
    [32m   1013[39m [38;5;28;01mwith[39;00m request_context(request=request):
    [32m-> [39m[32m1014[39m     response = [43mtransport[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m   1016[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(response.stream, SyncByteStream)
    [32m   1018[39m response.request = request

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:249[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    235[39m [38;5;28;01mimport[39;00m[38;5;250m [39m[34;01mhttpcore[39;00m
    [32m    237[39m req = httpcore.Request(
    [32m    238[39m     method=request.method,
    [32m    239[39m     url=httpcore.URL(
    [32m   (...)[39m[32m    247[39m     extensions=request.extensions,
    [32m    248[39m )
    [32m--> [39m[32m249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m    250[39m     resp = [38;5;28mself[39m._pool.handle_request(req)
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    156[39m     value = typ()
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m
    [32m    163[39m     [38;5;28;01mreturn[39;00m exc [38;5;129;01mis[39;00m [38;5;129;01mnot[39;00m value

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:118[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    115[39m     [38;5;28;01mraise[39;00m
    [32m    117[39m message = [38;5;28mstr[39m(exc)
    [32m--> [39m[32m118[39m [38;5;28;01mraise[39;00m mapped_exc(message) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m

    [31mReadTimeout[39m: The read operation timed out

``` python
url_2 = 'https://www.sapdatasheet.org/abap/tabl/makt.html'

df_2 = get_sap_table_structure(url_2)
df_2
```

    ReadTimeout: The read operation timed out
    [31m---------------------------------------------------------------------------[39m
    [31mReadTimeout[39m                               Traceback (most recent call last)
    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:101[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    100[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m101[39m     [38;5;28;01myield[39;00m
    [32m    102[39m [38;5;28;01mexcept[39;00m [38;5;167;01mException[39;00m [38;5;28;01mas[39;00m exc:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:250[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m--> [39m[32m250[39m     resp = [38;5;28;43mself[39;49m[43m.[49m[43m_pool[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mreq[49m[43m)[49m
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:256[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    255[39m     [38;5;28mself[39m._close_connections(closing)
    [32m--> [39m[32m256[39m     [38;5;28;01mraise[39;00m exc [38;5;28;01mfrom[39;00m[38;5;250m [39m[38;5;28;01mNone[39;00m
    [32m    258[39m [38;5;66;03m# Return the response. Note that in this case we still have to manage[39;00m
    [32m    259[39m [38;5;66;03m# the point at which the response is closed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py:236[39m, in [36mConnectionPool.handle_request[39m[34m(self, request)[39m
    [32m    234[39m [38;5;28;01mtry[39;00m:
    [32m    235[39m     [38;5;66;03m# Send the request on the assigned connection.[39;00m
    [32m--> [39m[32m236[39m     response = [43mconnection[49m[43m.[49m[43mhandle_request[49m[43m([49m
    [32m    237[39m [43m        [49m[43mpool_request[49m[43m.[49m[43mrequest[49m
    [32m    238[39m [43m    [49m[43m)[49m
    [32m    239[39m [38;5;28;01mexcept[39;00m ConnectionNotAvailable:
    [32m    240[39m     [38;5;66;03m# In some cases a connection may initially be available to[39;00m
    [32m    241[39m     [38;5;66;03m# handle a request, but then become unavailable.[39;00m
    [32m    242[39m     [38;5;66;03m#[39;00m
    [32m    243[39m     [38;5;66;03m# In this case we clear the connection and try again.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/connection.py:103[39m, in [36mHTTPConnection.handle_request[39m[34m(self, request)[39m
    [32m    101[39m     [38;5;28;01mraise[39;00m exc
    [32m--> [39m[32m103[39m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[43m.[49m[43m_connection[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mrequest[49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:136[39m, in [36mHTTP11Connection.handle_request[39m[34m(self, request)[39m
    [32m    135[39m         [38;5;28mself[39m._response_closed()
    [32m--> [39m[32m136[39m [38;5;28;01mraise[39;00m exc

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:106[39m, in [36mHTTP11Connection.handle_request[39m[34m(self, request)[39m
    [32m     97[39m [38;5;28;01mwith[39;00m Trace(
    [32m     98[39m     [33m"[39m[33mreceive_response_headers[39m[33m"[39m, logger, request, kwargs
    [32m     99[39m ) [38;5;28;01mas[39;00m trace:
    [32m    100[39m     (
    [32m    101[39m         http_version,
    [32m    102[39m         status,
    [32m    103[39m         reason_phrase,
    [32m    104[39m         headers,
    [32m    105[39m         trailing_data,
    [32m--> [39m[32m106[39m     ) = [38;5;28;43mself[39;49m[43m.[49m[43m_receive_response_headers[49m[43m([49m[43m*[49m[43m*[49m[43mkwargs[49m[43m)[49m
    [32m    107[39m     trace.return_value = (
    [32m    108[39m         http_version,
    [32m    109[39m         status,
    [32m    110[39m         reason_phrase,
    [32m    111[39m         headers,
    [32m    112[39m     )

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:177[39m, in [36mHTTP11Connection._receive_response_headers[39m[34m(self, request)[39m
    [32m    176[39m [38;5;28;01mwhile[39;00m [38;5;28;01mTrue[39;00m:
    [32m--> [39m[32m177[39m     event = [38;5;28;43mself[39;49m[43m.[49m[43m_receive_event[49m[43m([49m[43mtimeout[49m[43m=[49m[43mtimeout[49m[43m)[49m
    [32m    178[39m     [38;5;28;01mif[39;00m [38;5;28misinstance[39m(event, h11.Response):

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_sync/http11.py:217[39m, in [36mHTTP11Connection._receive_event[39m[34m(self, timeout)[39m
    [32m    216[39m [38;5;28;01mif[39;00m event [38;5;129;01mis[39;00m h11.NEED_DATA:
    [32m--> [39m[32m217[39m     data = [38;5;28;43mself[39;49m[43m.[49m[43m_network_stream[49m[43m.[49m[43mread[49m[43m([49m
    [32m    218[39m [43m        [49m[38;5;28;43mself[39;49m[43m.[49m[43mREAD_NUM_BYTES[49m[43m,[49m[43m [49m[43mtimeout[49m[43m=[49m[43mtimeout[49m
    [32m    219[39m [43m    [49m[43m)[49m
    [32m    221[39m     [38;5;66;03m# If we feed this case through h11 we'll raise an exception like:[39;00m
    [32m    222[39m     [38;5;66;03m#[39;00m
    [32m    223[39m     [38;5;66;03m#     httpcore.RemoteProtocolError: can't handle event type[39;00m
    [32m   (...)[39m[32m    227[39m     [38;5;66;03m# perspective. Instead we handle this case distinctly and treat[39;00m
    [32m    228[39m     [38;5;66;03m# it as a ConnectError.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_backends/sync.py:126[39m, in [36mSyncStream.read[39m[34m(self, max_bytes, timeout)[39m
    [32m    125[39m exc_map: ExceptionMapping = {socket.timeout: ReadTimeout, [38;5;167;01mOSError[39;00m: ReadError}
    [32m--> [39m[32m126[39m [38;5;28;01mwith[39;00m map_exceptions(exc_map):
    [32m    127[39m     [38;5;28mself[39m._sock.settimeout(timeout)

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpcore/_exceptions.py:14[39m, in [36mmap_exceptions[39m[34m(map)[39m
    [32m     13[39m     [38;5;28;01mif[39;00m [38;5;28misinstance[39m(exc, from_exc):
    [32m---> [39m[32m14[39m         [38;5;28;01mraise[39;00m to_exc(exc) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m
    [32m     15[39m [38;5;28;01mraise[39;00m

    [31mReadTimeout[39m: The read operation timed out

    The above exception was the direct cause of the following exception:

    [31mReadTimeout[39m                               Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[25][39m[32m, line 3[39m
    [32m      1[39m url_2 = [33m'https://www.sapdatasheet.org/abap/tabl/makt.html'[39m
    [32m      2[39m 
    [32m----> [39m[32m3[39m df_2 = get_sap_table_structure(url_2)
    [32m      4[39m df_2

    [36mCell[39m[36m [39m[32mIn[23][39m[32m, line 9[39m, in [36mget_sap_table_structure[39m[34m(url)[39m
    [32m      5[39m     [38;5;28;01mimport[39;00m httpx
    [32m      6[39m     [38;5;28;01mfrom[39;00m bs4 [38;5;28;01mimport[39;00m BeautifulSoup
    [32m      7[39m     [38;5;28;01mimport[39;00m pandas [38;5;28;01mas[39;00m pd
    [32m      8[39m 
    [32m----> [39m[32m9[39m     response = httpx.get(url)
    [32m     10[39m     soup = BeautifulSoup(response.text, [33m'lxml'[39m)
    [32m     11[39m 
    [32m     12[39m     table = soup.find([33m'table'[39m, class_=[33m'table table-sm'[39m)

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:195[39m, in [36mget[39m[34m(url, params, headers, cookies, auth, proxy, follow_redirects, verify, timeout, trust_env)[39m
    [32m    174[39m [38;5;28;01mdef[39;00m[38;5;250m [39m[34mget[39m(
    [32m    175[39m     url: URL | [38;5;28mstr[39m,
    [32m    176[39m     *,
    [32m   (...)[39m[32m    185[39m     trust_env: [38;5;28mbool[39m = [38;5;28;01mTrue[39;00m,
    [32m    186[39m ) -> Response:
    [32m    187[39m [38;5;250m    [39m[33;03m"""[39;00m
    [32m    188[39m [33;03m    Sends a `GET` request.[39;00m
    [32m    189[39m 
    [32m   (...)[39m[32m    193[39m [33;03m    on this function, as `GET` requests should not include a request body.[39;00m
    [32m    194[39m [33;03m    """[39;00m
    [32m--> [39m[32m195[39m     [38;5;28;01mreturn[39;00m [43mrequest[49m[43m([49m
    [32m    196[39m [43m        [49m[33;43m"[39;49m[33;43mGET[39;49m[33;43m"[39;49m[43m,[49m
    [32m    197[39m [43m        [49m[43murl[49m[43m,[49m
    [32m    198[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    199[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    200[39m [43m        [49m[43mcookies[49m[43m=[49m[43mcookies[49m[43m,[49m
    [32m    201[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    202[39m [43m        [49m[43mproxy[49m[43m=[49m[43mproxy[49m[43m,[49m
    [32m    203[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    204[39m [43m        [49m[43mverify[49m[43m=[49m[43mverify[49m[43m,[49m
    [32m    205[39m [43m        [49m[43mtimeout[49m[43m=[49m[43mtimeout[49m[43m,[49m
    [32m    206[39m [43m        [49m[43mtrust_env[49m[43m=[49m[43mtrust_env[49m[43m,[49m
    [32m    207[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_api.py:109[39m, in [36mrequest[39m[34m(method, url, params, content, data, files, json, headers, cookies, auth, proxy, timeout, follow_redirects, verify, trust_env)[39m
    [32m     57[39m [38;5;250m[39m[33;03m"""[39;00m
    [32m     58[39m [33;03mSends an HTTP request.[39;00m
    [32m     59[39m 
    [32m   (...)[39m[32m    100[39m [33;03m```[39;00m
    [32m    101[39m [33;03m"""[39;00m
    [32m    102[39m [38;5;28;01mwith[39;00m Client(
    [32m    103[39m     cookies=cookies,
    [32m    104[39m     proxy=proxy,
    [32m   (...)[39m[32m    107[39m     trust_env=trust_env,
    [32m    108[39m ) [38;5;28;01mas[39;00m client:
    [32m--> [39m[32m109[39m     [38;5;28;01mreturn[39;00m [43mclient[49m[43m.[49m[43mrequest[49m[43m([49m
    [32m    110[39m [43m        [49m[43mmethod[49m[43m=[49m[43mmethod[49m[43m,[49m
    [32m    111[39m [43m        [49m[43murl[49m[43m=[49m[43murl[49m[43m,[49m
    [32m    112[39m [43m        [49m[43mcontent[49m[43m=[49m[43mcontent[49m[43m,[49m
    [32m    113[39m [43m        [49m[43mdata[49m[43m=[49m[43mdata[49m[43m,[49m
    [32m    114[39m [43m        [49m[43mfiles[49m[43m=[49m[43mfiles[49m[43m,[49m
    [32m    115[39m [43m        [49m[43mjson[49m[43m=[49m[43mjson[49m[43m,[49m
    [32m    116[39m [43m        [49m[43mparams[49m[43m=[49m[43mparams[49m[43m,[49m
    [32m    117[39m [43m        [49m[43mheaders[49m[43m=[49m[43mheaders[49m[43m,[49m
    [32m    118[39m [43m        [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    119[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    120[39m [43m    [49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:825[39m, in [36mClient.request[39m[34m(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)[39m
    [32m    810[39m     warnings.warn(message, [38;5;167;01mDeprecationWarning[39;00m, stacklevel=[32m2[39m)
    [32m    812[39m request = [38;5;28mself[39m.build_request(
    [32m    813[39m     method=method,
    [32m    814[39m     url=url,
    [32m   (...)[39m[32m    823[39m     extensions=extensions,
    [32m    824[39m )
    [32m--> [39m[32m825[39m [38;5;28;01mreturn[39;00m [38;5;28;43mself[39;49m[43m.[49m[43msend[49m[43m([49m[43mrequest[49m[43m,[49m[43m [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m[43m [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m)[49m

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:914[39m, in [36mClient.send[39m[34m(self, request, stream, auth, follow_redirects)[39m
    [32m    910[39m [38;5;28mself[39m._set_timeout(request)
    [32m    912[39m auth = [38;5;28mself[39m._build_request_auth(request, auth)
    [32m--> [39m[32m914[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_auth[49m[43m([49m
    [32m    915[39m [43m    [49m[43mrequest[49m[43m,[49m
    [32m    916[39m [43m    [49m[43mauth[49m[43m=[49m[43mauth[49m[43m,[49m
    [32m    917[39m [43m    [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    918[39m [43m    [49m[43mhistory[49m[43m=[49m[43m[[49m[43m][49m[43m,[49m
    [32m    919[39m [43m[49m[43m)[49m
    [32m    920[39m [38;5;28;01mtry[39;00m:
    [32m    921[39m     [38;5;28;01mif[39;00m [38;5;129;01mnot[39;00m stream:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:942[39m, in [36mClient._send_handling_auth[39m[34m(self, request, auth, follow_redirects, history)[39m
    [32m    939[39m request = [38;5;28mnext[39m(auth_flow)
    [32m    941[39m [38;5;28;01mwhile[39;00m [38;5;28;01mTrue[39;00m:
    [32m--> [39m[32m942[39m     response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_handling_redirects[49m[43m([49m
    [32m    943[39m [43m        [49m[43mrequest[49m[43m,[49m
    [32m    944[39m [43m        [49m[43mfollow_redirects[49m[43m=[49m[43mfollow_redirects[49m[43m,[49m
    [32m    945[39m [43m        [49m[43mhistory[49m[43m=[49m[43mhistory[49m[43m,[49m
    [32m    946[39m [43m    [49m[43m)[49m
    [32m    947[39m     [38;5;28;01mtry[39;00m:
    [32m    948[39m         [38;5;28;01mtry[39;00m:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:979[39m, in [36mClient._send_handling_redirects[39m[34m(self, request, follow_redirects, history)[39m
    [32m    976[39m [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mrequest[39m[33m"[39m]:
    [32m    977[39m     hook(request)
    [32m--> [39m[32m979[39m response = [38;5;28;43mself[39;49m[43m.[49m[43m_send_single_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m    980[39m [38;5;28;01mtry[39;00m:
    [32m    981[39m     [38;5;28;01mfor[39;00m hook [38;5;129;01min[39;00m [38;5;28mself[39m._event_hooks[[33m"[39m[33mresponse[39m[33m"[39m]:

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_client.py:1014[39m, in [36mClient._send_single_request[39m[34m(self, request)[39m
    [32m   1009[39m     [38;5;28;01mraise[39;00m [38;5;167;01mRuntimeError[39;00m(
    [32m   1010[39m         [33m"[39m[33mAttempted to send an async request with a sync Client instance.[39m[33m"[39m
    [32m   1011[39m     )
    [32m   1013[39m [38;5;28;01mwith[39;00m request_context(request=request):
    [32m-> [39m[32m1014[39m     response = [43mtransport[49m[43m.[49m[43mhandle_request[49m[43m([49m[43mrequest[49m[43m)[49m
    [32m   1016[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(response.stream, SyncByteStream)
    [32m   1018[39m response.request = request

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:249[39m, in [36mHTTPTransport.handle_request[39m[34m(self, request)[39m
    [32m    235[39m [38;5;28;01mimport[39;00m[38;5;250m [39m[34;01mhttpcore[39;00m
    [32m    237[39m req = httpcore.Request(
    [32m    238[39m     method=request.method,
    [32m    239[39m     url=httpcore.URL(
    [32m   (...)[39m[32m    247[39m     extensions=request.extensions,
    [32m    248[39m )
    [32m--> [39m[32m249[39m [38;5;28;01mwith[39;00m map_httpcore_exceptions():
    [32m    250[39m     resp = [38;5;28mself[39m._pool.handle_request(req)
    [32m    252[39m [38;5;28;01massert[39;00m [38;5;28misinstance[39m(resp.stream, typing.Iterable)

    [36mFile [39m[32m/usr/local/lib/python3.12/contextlib.py:158[39m, in [36m_GeneratorContextManager.__exit__[39m[34m(self, typ, value, traceback)[39m
    [32m    156[39m     value = typ()
    [32m    157[39m [38;5;28;01mtry[39;00m:
    [32m--> [39m[32m158[39m     [38;5;28;43mself[39;49m[43m.[49m[43mgen[49m[43m.[49m[43mthrow[49m[43m([49m[43mvalue[49m[43m)[49m
    [32m    159[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;28;01mas[39;00m exc:
    [32m    160[39m     [38;5;66;03m# Suppress StopIteration *unless* it's the same exception that[39;00m
    [32m    161[39m     [38;5;66;03m# was passed to throw().  This prevents a StopIteration[39;00m
    [32m    162[39m     [38;5;66;03m# raised inside the "with" statement from being suppressed.[39;00m
    [32m    163[39m     [38;5;28;01mreturn[39;00m exc [38;5;129;01mis[39;00m [38;5;129;01mnot[39;00m value

    [36mFile [39m[32m/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py:118[39m, in [36mmap_httpcore_exceptions[39m[34m()[39m
    [32m    115[39m     [38;5;28;01mraise[39;00m
    [32m    117[39m message = [38;5;28mstr[39m(exc)
    [32m--> [39m[32m118[39m [38;5;28;01mraise[39;00m mapped_exc(message) [38;5;28;01mfrom[39;00m[38;5;250m [39m[34;01mexc[39;00m

    [31mReadTimeout[39m: The read operation timed out

### Apply process to all tables

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L38"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_table_url

``` python

def get_sap_table_url(
    table_name
):

```

*Constructs sapdatasheet.org URL from SAP table name*

``` python
get_sap_table_url('VBAK')
```

    'https://www.sapdatasheet.org/abap/tabl/vbak.html'

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L92"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_tables_structure

``` python

def get_sap_tables_structure(
    tables
):

```

*Gets structure for multiple SAP tables and combines them into one
DataFrame with a column indicating the source table*

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L57"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_table_structure

``` python

def get_sap_table_structure(
    url
):

```

*Scrapes SAP table structure from sapdatasheet.org and returns a pandas
DataFrame* Returns None if table not found or other error occurs

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L92"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_tables_structure

``` python

def get_sap_tables_structure(
    tables
):

```

*Gets structure for multiple SAP tables and combines them into one
DataFrame with a column indicating the source table*

``` python
tables = ['MARC', 'MARD', 'MARM', 'MBEW']
```

``` python
df = get_sap_tables_structure(tables)
df.sample(10)
```

    Error processing https://www.sapdatasheet.org/abap/tabl/marc.html: The read operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/mard.html: The read operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/marm.html: The read operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/mbew.html: The read operation timed out

    AttributeError: 'NoneType' object has no attribute 'sample'
    [31m---------------------------------------------------------------------------[39m
    [31mAttributeError[39m                            Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[32][39m[32m, line 2[39m
    [32m      1[39m df = get_sap_tables_structure(tables)
    [32m----> [39m[32m2[39m df.sample([32m10[39m)

    [31mAttributeError[39m: 'NoneType' object has no attribute 'sample'

### Let’s also grab the table description for each table.

``` python
url
```

    'https://www.sapdatasheet.org/abap/tabl/mara.html'

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L107"
target="_blank" style="float:right; font-size:smaller">source</a>

### get_sap_table_description

``` python

def get_sap_table_description(
    url
):

```

*Scrapes SAP table description from sapdatasheet.org* Returns None if
not found or error occurs

``` python
get_sap_table_description(url)
```

    Error getting description from https://www.sapdatasheet.org/abap/tabl/mara.html: Server error '500 Internal Server Error' for url 'https://www.sapdatasheet.org/abap/tabl/mara.html'
    For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500

``` python
from pathlib import Path
```

``` python
CACHE_PATH = Path('sap_sheet.parquet')
```

``` python
def get_sap_tables_structure(tables, cache_path=CACHE_PATH, force_refresh=False):
    """
    Gets structure for multiple SAP tables, caching results in a single parquet file.
    Only scrapes tables not already in the cache. force_refresh=True re-scrapes everything.
    """
    if force_refresh and cache_path.exists():
        cache_path.unlink()

    # Load existing cache
    if cache_path.exists():
        cached = pd.read_parquet(cache_path)
        already_cached = set(cached['Table'].unique())
    else:
        cached = None
        already_cached = set()

    # Scrape only missing tables
    missing = [t for t in tables if t.upper() not in already_cached]
    new_dfs = []
    for table in missing:
        url = get_sap_table_url(table)
        df = get_sap_table_structure(url)
        if df is not None:
            df['Table'] = table.upper()
            df['Table Description'] = get_sap_table_description(url)
            new_dfs.append(df)

    # Combine and save
    all_dfs = [d for d in [cached] + new_dfs if d is not None]
    if not all_dfs:
        return None
    result = pd.concat(all_dfs, ignore_index=True)
    if missing:  # only write if something new was added
        result.to_parquet(cache_path, index=False)

    # Return only requested tables
    return result[result['Table'].isin([t.upper() for t in tables])]
```

``` python
sap_sheet = get_sap_tables_structure(tables)
sap_sheet.sample(10)
```

    Error processing https://www.sapdatasheet.org/abap/tabl/marc.html: The read operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/mard.html: The read operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/marm.html: _ssl.c:993: The handshake operation timed out
    Error processing https://www.sapdatasheet.org/abap/tabl/mbew.html: _ssl.c:993: The handshake operation timed out

    AttributeError: 'NoneType' object has no attribute 'sample'
    [31m---------------------------------------------------------------------------[39m
    [31mAttributeError[39m                            Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[39][39m[32m, line 2[39m
    [32m      1[39m sap_sheet = get_sap_tables_structure(tables)
    [32m----> [39m[32m2[39m sap_sheet.sample([32m10[39m)

    [31mAttributeError[39m: 'NoneType' object has no attribute 'sample'

``` python
sap_sheet['Table Description'].unique()
```

    TypeError: 'NoneType' object is not subscriptable
    [31m---------------------------------------------------------------------------[39m
    [31mTypeError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[40][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m sap_sheet[[33m'Table Description'[39m].unique()

    [31mTypeError[39m: 'NoneType' object is not subscriptable

``` python
data = {
    'MANDT': [100, 100, 100],
    'MATNR': ['MAT001', 'MAT002', 'MAT003'],
    'WERKS': ['1000', '1000', '2000'],
    'PSTAT': ['KVEB', 'KVEB', 'KVEB'],
    'LVORM': [None, None, None],
    'BWTTY': [None, None, None],
    'MMSTA': ['1', '1', '9'],
    'MMSTD': [20200101, 20200115, 20210201],
    'MAABC': ['A', 'B', None],
    'KZKRI': [None, None, None],
    'EKGRP': ['100', '100', '200'],
    'DISPO': ['001', '001', '002'],
    'BESKZ': ['E', 'E', 'X'],
    'SOBSL': [None, None, None],
    'EISBE': [10.0, 20.0, 15.0],
    'MABST': [100.0, 200.0, 150.0],
    'ALTSL': [None, None, None],
    'KZAUS': [None, None, None],
    'AUSDT': [0, 0, 0],
    'NFMAT': [None, None, None],
    'KZBED': [None, 'T', None],
    'RGEKZ': [None, None, None],
    'FEVOR': ['G01', None, 'G02'],
    'BASMG': [1.0, 1.0, 1.0],
    'STAWN': ['84141025', '84141025', '84148073'],
    'HERKL': ['DE', 'DE', 'US'],
    'HERKR': ['05', '05', '16'],
    'EXPME': ['ST', 'ST', 'ST'],
    'MTVER': ['1', '1', '1'],
    'PRCTR': ['PC100', 'PC100', 'PC200'],
    'VERKZ': [None, 'X', None],
    'STLAL': [None, None, None],
    'STLAN': [None, None, None],
    'PLNNR': [None, None, None],
    'APLAL': [None, None, None],
    'FRTME': [None, None, None],
    'LGPRO': ['1001', '1001', '2001'],
    'DISGR': ['2000', '2000', '6000'],
    'SERNP': [None, None, None],
    'PREFE': [None, None, None],
    'PRENE': [None, None, None],
    'SCHGT': [None, None, None],
    'MCRUE': ['X', 'X', 'X'],
    'LFGJA': [2024, 2024, 2025],
    'EISLO': [0.0, 0.0, 0.0],
    'TARGET_STOCK': [50.0, 100.0, 75.0],
    'SCM_SCOST': [0.0, 0.0, 0.0],
    'SCM_LSUOM': [None, None, None],
    'SCM_STRA1': [None, None, None],
}

df_marc = pd.DataFrame(data)
df_marc
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

<table class="dataframe" data-quarto-postprocess="true" data-border="1">
<thead>
<tr style="text-align: right;">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">MANDT</th>
<th data-quarto-table-cell-role="th">MATNR</th>
<th data-quarto-table-cell-role="th">WERKS</th>
<th data-quarto-table-cell-role="th">PSTAT</th>
<th data-quarto-table-cell-role="th">LVORM</th>
<th data-quarto-table-cell-role="th">BWTTY</th>
<th data-quarto-table-cell-role="th">MMSTA</th>
<th data-quarto-table-cell-role="th">MMSTD</th>
<th data-quarto-table-cell-role="th">MAABC</th>
<th data-quarto-table-cell-role="th">KZKRI</th>
<th data-quarto-table-cell-role="th">...</th>
<th data-quarto-table-cell-role="th">PREFE</th>
<th data-quarto-table-cell-role="th">PRENE</th>
<th data-quarto-table-cell-role="th">SCHGT</th>
<th data-quarto-table-cell-role="th">MCRUE</th>
<th data-quarto-table-cell-role="th">LFGJA</th>
<th data-quarto-table-cell-role="th">EISLO</th>
<th data-quarto-table-cell-role="th">TARGET_STOCK</th>
<th data-quarto-table-cell-role="th">SCM_SCOST</th>
<th data-quarto-table-cell-role="th">SCM_LSUOM</th>
<th data-quarto-table-cell-role="th">SCM_STRA1</th>
</tr>
</thead>
<tbody>
<tr>
<td data-quarto-table-cell-role="th">0</td>
<td>100</td>
<td>MAT001</td>
<td>1000</td>
<td>KVEB</td>
<td>None</td>
<td>None</td>
<td>1</td>
<td>20200101</td>
<td>A</td>
<td>None</td>
<td>...</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>X</td>
<td>2024</td>
<td>0.0</td>
<td>50.0</td>
<td>0.0</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">1</td>
<td>100</td>
<td>MAT002</td>
<td>1000</td>
<td>KVEB</td>
<td>None</td>
<td>None</td>
<td>1</td>
<td>20200115</td>
<td>B</td>
<td>None</td>
<td>...</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>X</td>
<td>2024</td>
<td>0.0</td>
<td>100.0</td>
<td>0.0</td>
<td>None</td>
<td>None</td>
</tr>
<tr>
<td data-quarto-table-cell-role="th">2</td>
<td>100</td>
<td>MAT003</td>
<td>2000</td>
<td>KVEB</td>
<td>None</td>
<td>None</td>
<td>9</td>
<td>20210201</td>
<td>None</td>
<td>None</td>
<td>...</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>X</td>
<td>2025</td>
<td>0.0</td>
<td>75.0</td>
<td>0.0</td>
<td>None</td>
<td>None</td>
</tr>
</tbody>
</table>

<p>3 rows × 49 columns</p>
</div>

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L152"
target="_blank" style="float:right; font-size:smaller">source</a>

### convert_sap_types

``` python

def convert_sap_types(
    df:DataFrame, sap_sheet:DataFrame
)->DataFrame:

```

*Converts the columns of the ‘df’ DataFrame to the correct data types
based on ‘sap_sheet’.*

:param df: Pandas DataFrame containing SAP table data (e.g., “df”).
:param sap_sheet: Pandas DataFrame containing SAP metadata with column
data types. :return: Converted Pandas DataFrame.

``` python
df_marc.info()
```

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 49 columns):
     #   Column        Non-Null Count  Dtype  
    ---  ------        --------------  -----  
     0   MANDT         3 non-null      int64  
     1   MATNR         3 non-null      object 
     2   WERKS         3 non-null      object 
     3   PSTAT         3 non-null      object 
     4   LVORM         0 non-null      object 
     5   BWTTY         0 non-null      object 
     6   MMSTA         3 non-null      object 
     7   MMSTD         3 non-null      int64  
     8   MAABC         2 non-null      object 
     9   KZKRI         0 non-null      object 
     10  EKGRP         3 non-null      object 
     11  DISPO         3 non-null      object 
     12  BESKZ         3 non-null      object 
     13  SOBSL         0 non-null      object 
     14  EISBE         3 non-null      float64
     15  MABST         3 non-null      float64
     16  ALTSL         0 non-null      object 
     17  KZAUS         0 non-null      object 
     18  AUSDT         3 non-null      int64  
     19  NFMAT         0 non-null      object 
     20  KZBED         1 non-null      object 
     21  RGEKZ         0 non-null      object 
     22  FEVOR         2 non-null      object 
     23  BASMG         3 non-null      float64
     24  STAWN         3 non-null      object 
     25  HERKL         3 non-null      object 
     26  HERKR         3 non-null      object 
     27  EXPME         3 non-null      object 
     28  MTVER         3 non-null      object 
     29  PRCTR         3 non-null      object 
     30  VERKZ         1 non-null      object 
     31  STLAL         0 non-null      object 
     32  STLAN         0 non-null      object 
     33  PLNNR         0 non-null      object 
     34  APLAL         0 non-null      object 
     35  FRTME         0 non-null      object 
     36  LGPRO         3 non-null      object 
     37  DISGR         3 non-null      object 
     38  SERNP         0 non-null      object 
     39  PREFE         0 non-null      object 
     40  PRENE         0 non-null      object 
     41  SCHGT         0 non-null      object 
     42  MCRUE         3 non-null      object 
     43  LFGJA         3 non-null      int64  
     44  EISLO         3 non-null      float64
     45  TARGET_STOCK  3 non-null      float64
     46  SCM_SCOST     3 non-null      float64
     47  SCM_LSUOM     0 non-null      object 
     48  SCM_STRA1     0 non-null      object 
    dtypes: float64(6), int64(4), object(39)
    memory usage: 1.3+ KB

``` python
df_marc_converted = convert_sap_types(df_marc, sap_sheet)
df_marc_converted.head()
```

    AttributeError: 'NoneType' object has no attribute 'rename'
    [31m---------------------------------------------------------------------------[39m
    [31mAttributeError[39m                            Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[45][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df_marc_converted = convert_sap_types(df_marc, sap_sheet)
    [32m      2[39m df_marc_converted.head()

    [36mCell[39m[36m [39m[32mIn[43][39m[32m, line 9[39m, in [36mconvert_sap_types[39m[34m(df, sap_sheet)[39m
    [32m      5[39m     :param df: Pandas DataFrame containing SAP table data (e.g., [33m"df"[39m).
    [32m      6[39m     :param sap_sheet: Pandas DataFrame containing SAP metadata [38;5;28;01mwith[39;00m column data types.
    [32m      7[39m     :[38;5;28;01mreturn[39;00m: Converted Pandas DataFrame.
    [32m      8[39m     """
    [32m----> [39m[32m9[39m     sap_sheet = sap_sheet.rename(columns=[38;5;28;01mlambda[39;00m x: x.strip())
    [32m     10[39m     column_type_mapping = sap_sheet.set_index([33m"Field"[39m)[[33m"DataType"[39m].to_dict()
    [32m     11[39m 
    [32m     12[39m     [38;5;28;01mfor[39;00m column, sap_type [38;5;28;01min[39;00m column_type_mapping.items():

    [31mAttributeError[39m: 'NoneType' object has no attribute 'rename'

``` python
df_marc_converted.info()
```

    NameError: name 'df_marc_converted' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[46][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df_marc_converted.info()

    [31mNameError[39m: name 'df_marc_converted' is not defined

------------------------------------------------------------------------

<a
href="https://github.com/MIS-Analytics/mis_analytics/blob/main/mis_analytics/desapher.py#L178"
target="_blank" style="float:right; font-size:smaller">source</a>

### rename_sap_columns

``` python

def rename_sap_columns(
    df:DataFrame, sap_sheet:DataFrame
)->DataFrame:

```

*Renames the columns in the ‘df’ DataFrame using the ‘Short Description’
from ‘sap_sheet’.* If a column is not found in ‘sap_sheet’, it remains
unchanged.

:param df: Pandas DataFrame containing SAP table data (e.g., “df”).
:param sap_sheet: Pandas DataFrame containing SAP metadata with column
names and short descriptions. :return: DataFrame with renamed columns.

``` python
df_marc_converted = rename_sap_columns(df_marc_converted, sap_sheet)
df_marc_converted.info()
```

    NameError: name 'df_marc_converted' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[48][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df_marc_converted = rename_sap_columns(df_marc_converted, sap_sheet)
    [32m      2[39m df_marc_converted.info()

    [31mNameError[39m: name 'df_marc_converted' is not defined

``` python
df_marc_converted = clean_col_names(df_marc_converted)
df_marc_converted.info()
```

    NameError: name 'df_marc_converted' is not defined
    [31m---------------------------------------------------------------------------[39m
    [31mNameError[39m                                 Traceback (most recent call last)
    [36mCell[39m[36m [39m[32mIn[49][39m[32m, line 1[39m
    [32m----> [39m[32m1[39m df_marc_converted = clean_col_names(df_marc_converted)
    [32m      2[39m df_marc_converted.info()

    [31mNameError[39m: name 'df_marc_converted' is not defined

# Next steps

- we should write some kind of setup functionality to scrape the
  sap_sheet once and the have it as a lookup reference.

- We have duplicates in the `Field` column, because columns are in
  multiple tables.

- We also have duplicates in the `Short Description` column, sometimes
  even when the `Field` value isn’t the same.

- We could combine columns, for example `Field`, `Table`,
  `Short Description`, which would make a unique value. But currently we
  only match over the `Field` column. Therefore we can not decide which
  description we should take. We either have to set a sensible default
  or we have to have a way to know where the column originated from. The
  second option seem only viable if we just process sap columns from one
  sap table.

- the sensible default could be based on the number of times the table
  is used in other tables (the more usages the better).

ok, how to best cache the data to not always scrape the data at each run

##### 🤖Reply🤖<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

A few good options, from simplest to most robust:

1.  **Save to Parquet/CSV** — scrape once, save to file, load from file
    on subsequent runs. Parquet is nice because it preserves dtypes.
2.  **Pickle the DataFrame** — similar but keeps all Python types
    exactly.
3.  **SQLite** — good if you want to query the data without loading it
    all into memory.

For your use case, I’d recommend **Parquet** — it’s compact, fast,
preserves types well, and you can easily inspect it with pandas. The
pattern would be: check if the file exists → if yes, load it; if not,
scrape and save.

Something like:

``` python
from pathlib import Path

CACHE_PATH = Path('sap_sheet.parquet')

def get_or_scrape_sap_tables(tables, cache_path=CACHE_PATH):
    if cache_path.exists():
        return pd.read_parquet(cache_path)
    df = get_sap_tables_structure(tables)
    df.to_parquet(cache_path, index=False)
    return df
```

You could also add a `force_refresh=False` parameter to re-scrape when
needed. Want to go with this approach, or would you prefer something
different?

what if it is not the same tables every time?

##### 🤖Reply🤖<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

Good point! The simplest solution is to **cache per table** — one file
per SAP table. Then you only scrape tables you haven’t seen before:

``` python
CACHE_DIR = Path('sap_cache')
CACHE_DIR.mkdir(exist_ok=True)

def get_or_scrape_sap_table(table):
    cache_file = CACHE_DIR / f"{table.upper()}.parquet"
    if cache_file.exists():
        return pd.read_parquet(cache_file)
    df = get_sap_table_structure(get_sap_table_url(table))
    if df is not None:
        df['Table'] = table
        df['Table Description'] = get_sap_table_description(get_sap_table_url(table))
        df.to_parquet(cache_file, index=False)
    return df
```

Then
[`get_sap_tables_structure`](https://MIS-Analytics.github.io/mis_analytics/desapher.html#get_sap_tables_structure)
just calls this per table and concatenates. This way any previously
scraped table is free, and only new ones hit the web.

Would you like to wire this into
[`get_sap_tables_structure`](https://MIS-Analytics.github.io/mis_analytics/desapher.html#get_sap_tables_structure),
or would you prefer to try that yourself?

maybe there is a middle ground where we only have one parquet file which
has all the tables downloaded so far. then new ones are appended and a
force_refresh deletes and recreates the file

##### 🤖Reply🤖<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

That’s a nice middle ground! The logic would be:

1.  If cache exists, load it and check which of the requested tables are
    already in it
2.  Scrape only the missing ones
3.  Append and save back
4.  `force_refresh=True` nukes the file and re-scrapes everything

Want to try writing that yourself, or would you like me to write it out?

write it for me

##### 🤖Reply🤖<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

``` python
CACHE_PATH = Path('sap_sheet.parquet')

def get_sap_tables_structure(tables, cache_path=CACHE_PATH, force_refresh=False):
    """
    Gets structure for multiple SAP tables, caching results in a single parquet file.
    Only scrapes tables not already in the cache. force_refresh=True re-scrapes everything.
    """
    if force_refresh and cache_path.exists():
        cache_path.unlink()

    # Load existing cache
    if cache_path.exists():
        cached = pd.read_parquet(cache_path)
        already_cached = set(cached['Table'].unique())
    else:
        cached = None
        already_cached = set()

    # Scrape only missing tables
    missing = [t for t in tables if t.upper() not in already_cached]
    new_dfs = []
    for table in missing:
        url = get_sap_table_url(table)
        df = get_sap_table_structure(url)
        if df is not None:
            df['Table'] = table.upper()
            df['Table Description'] = get_sap_table_description(url)
            new_dfs.append(df)

    # Combine and save
    all_dfs = [d for d in [cached] + new_dfs if d is not None]
    if not all_dfs:
        return None
    result = pd.concat(all_dfs, ignore_index=True)
    if missing:  # only write if something new was added
        result.to_parquet(cache_path, index=False)

    # Return only requested tables
    return result[result['Table'].isin([t.upper() for t in tables])]
```

The key things it does: - Loads the cache and checks what’s already
there - Only scrapes the missing tables - Appends new tables and saves
back - Returns only the tables you asked for (even if the cache has
more) - `force_refresh=True` deletes the cache and starts fresh
