MVG Observatory

A toolset to observe the MVG subways

Datasource

The data that is located at data.mvg.auch.cool is sourced from an API endpoint from MVG. It contains departures from all Munich Subwaystations. These stations are scraped once a minute and all available departures for that station are stored. If a request to that API is not sucessfull (eg. an error is received or the request timed out) no retry is made. The scraper has an request limit of $number-subway-stations requests per minute. This ensures the API is not overloaded in case it already struggles to serve requests for other reasons.

Structure of the data

Each morning at 4AM CE(S)T an archive is created that contains all the request and response data of the previous day. That archive is created using tar and is compressed with zstandard. Each archive contains a root folder in the format of YYYYMMDD which contains a subfolder for each station with the station ID as the folder name. Each station then contains the data of all the requests made on that day for that station - each file starting with the unix timestamp of the request followed by either _body.json or _meta.json


            20240615/
            ├── de:09162:1
            │   ├── 1718409659_body.json
            │   ├── 1718409659_meta.json
          

The *_meta.json files contain all the metadata of the request. See the following file as example:


            {
              "appconnect_time": 5.9e-05,
              "connect_time": 5.9e-05,
              "headers": {
                "date": "Sat, 15 Jun 2024 00:00:59 GMT",
                "x-frame-options": "SAMEORIGIN",
                "strict-transport-security": "max-age=31536000",
                "pragma": "no-cache",
                "cache-control": "no-cache, no-store, max-age=0, must-revalidate",
                "expires": "0",
                "content-type": "application/json;charset=UTF-8",
                "server": "SWM Webserver",
                "set-cookie": "REDACTED"
              },
              "httpauth_avail": 0,
              "namelookup_time": 5.9e-05,
              "pretransfer_time": 0.00018,
              "primary_ip": "REDACTED",
              "redirect_count": 0,
              "redirect_url": null,
              "request_params": {
                "globalId": "de:09162:1"
              },
              "request_header": {
                "User-Agent": "REDACTED"
              },
              "request_size": 181,
              "request_url": "REDACTED",
              "response_code": 200,
              "return_code": "ok",
              "return_message": "No error",
              "size_download": 10905.0,
              "size_upload": 0.0,
              "starttransfer_time": 0.910849,
              "total_time": 0.91127
            }
          

The *_body.json file contains the raw, unprocessed response for the request. This means it could either be the expected JSON response (as detailed below) or a simple plain text error from the webserver. The status code in the _meta.json file should indicate what to expect (if a connection was established).


            [
              {
                "plannedDepartureTime": 1718409600000,
                "realtime": false,
                "realtimeDepartureTime": 1718409600000,
                "transportType": "BUS",
                "label": "N40",
                "divaId": "33N40",
                "network": "swm",
                "trainType": "",
                "destination": "Klinikum Großhadern",
                "cancelled": false,
                "sev": false,
                "stopPositionNumber": 8,
                "messages": [],
                "bannerHash": "",
                "occupancy": "LOW",
                "stopPointGlobalId": "de:09162:1:7:7"
              },
              {
                "plannedDepartureTime": 1718409660000,
                "realtime": true,
                "delayInMinutes": 0,
                "realtimeDepartureTime": 1718409660000,
                "transportType": "BUS",
                "label": "N40",
                "divaId": "33N40",
                "network": "swm",
                "trainType": "",
                "destination": "Kieferngarten",
                "cancelled": false,
                "sev": false,
                "stopPositionNumber": 9,
                "messages": [],
                "bannerHash": "",
                "occupancy": "LOW",
                "stopPointGlobalId": "de:09162:1:6:6"
              },
              ...
          

The documentation is work in progress and will be extended in the future.