Basic functions

`open_mastr.Mastr`

Mastr is used to download the MaStR database and keep it up-to-date.

A SQL database is used to mirror the MaStR database. It can be filled with data either from the MaStR-bulk download or from the MaStR-API.

Example

from open_mastr import Mastr

db = Mastr()
db.download()

PARAMETER	DESCRIPTION
`engine`	Defines the engine of the database where the MaStR is mirrored to. Default is 'sqlite'. TYPE: `(sqlite, Engine)` DEFAULT: `'sqlite'`
`connect_to_translated_db`	`Allows connection to an existing translated database. Default is 'False'. Only for 'sqlite'-type engines.` DEFAULT: `False`

`download(method='bulk', data=None, date=None, bulk_cleansing=True, api_processes=None, api_limit=50, api_chunksize=1000, api_data_types=None, api_location_types=None, **kwargs)`

Download the MaStR either via the bulk download or via the MaStR API and write it to a SQLite database.

PARAMETER DESCRIPTION

method

Either "API" or "bulk". Determines whether the data is downloaded via the zipped bulk download or via the MaStR API. The latter requires an account from marktstammdatenregister.de, (see :ref:Configuration <Configuration>). Default to 'bulk'.

TYPE: API or bulk DEFAULT: 'bulk'

data

Determines which types of data are written to the database. If None, all data is used. If it is a list, possible entries are listed below with respect to the download method. Missing categories are being developed. If only one data is of interest, this can be given as a string. Default to None, where all data is included.

Data	Bulk	API
"wind"	Yes	Yes
"solar"	Yes	Yes
"biomass"	Yes	Yes
"hydro"	Yes	Yes
"gsgk"	Yes	Yes
"combustion"	Yes	Yes
"nuclear"	Yes	Yes
"gas"	Yes	Yes
"storage"	Yes	Yes
"electricity_consumer"	Yes	No
"location"	Yes	Yes
"market"	Yes	No
"grid"	Yes	No
"balancing_area"	Yes	No
"permit"	Yes	Yes
"deleted_units"	Yes	No
"retrofit_units"	Yes	No

TYPE: str or list or None DEFAULT: None

date

date	Bulk	API
"today"	latest files are downloaded from marktstammdatenregister.de	-
"20230101"	If file from this date exists locally, it is used. Otherwise it throws an error (You can only receive todays data from the server)	-
"existing"	Use latest downloaded zipped xml files, throws an error if the bulk download folder is empty	-
"latest"	-	Retrieve data that is newer than the newest data already in the table
datetime.datetime(2020, 11, 27)	-	Retrieve data that is newer than this time stamp
None	set date="today"	set date="latest"

Default to None.

TYPE: None or `datetime.datetime` or str DEFAULT: None

bulk_cleansing

If True, data cleansing is applied after the download (which is recommended). Default to True.

TYPE: bool DEFAULT: True

api_processes

Number of parallel processes used to download additional data. Defaults to None. If set to "max", the maximum number of possible processes is used.

Warning

The implementation of parallel processes is currently under construction. Please let the argument api_processes at the default value None.

TYPE: int or None or max DEFAULT: None

api_limit

Limit the number of units that data is downloaded for. Defaults to None which refers to query data for existing data requests, for example created by create_additional_data_requests. Note: There is a limited number of requests you are allowed to have per day, so setting api_limit to a value is recommended.

TYPE: int or None DEFAULT: 50

api_chunksize

Data is downloaded and inserted into the database in chunks of chunksize. Defaults to 1000.

TYPE: int or None DEFAULT: 1000

api_data_types

Select the type of additional data that should be retrieved. Choose from "unit_data", "eeg_data", "kwk_data", "permit_data". Defaults to all.

TYPE: list or None DEFAULT: None

api_location_types

Select the type of location that should be retrieved. Choose from "location_elec_generation", "location_elec_consumption", "location_gas_generation", "location_gas_consumption". Defaults to all.

TYPE: list or None DEFAULT: None

`to_csv(tables=None, chunksize=500000, limit=None)`

Save the database as csv files along with the metadata file. If 'tables=None' all possible tables will be exported.

PARAMETER	DESCRIPTION
`tables`	For exporting selected tables choose from: ["wind", "solar", "biomass", "hydro", "gsgk", "combustion", "nuclear", "storage", "balancing_area", "electricity_consumer", "gas_consumer", "gas_producer", "gas_storage", "gas_storage_extended", "grid_connections", "grids", "market_actors", "market_roles", "locations_extended, 'permit', 'deleted_units' ] TYPE: `list` DEFAULT: `None`
`chunksize`	Defines the chunksize of the tables export. Default value is 500.000 rows to include in each chunk. TYPE: `int` DEFAULT: `500000`
`limit`	Limits the number of exported data rows. TYPE: `int` DEFAULT: `None`

`translate()`

A database can be translated only once.

Deletes translated versions of the currently connected database.

Translates currently connected database,renames it with '-translated' suffix and updates self.engine's path accordingly.

Example

from open_mastr import Mastr
import pandas as pd

db = Mastr()
db.download(data='biomass')
db.translate()

df = pd.read_sql(table='biomass_extended', con=db.engine)
print(df.head(10))