Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The MicrosoftSentinelProvider
class provides a way to interact with the Microsoft Sentinel data lake, allowing you to perform operations such as listing databases, reading tables, and saving data. This class is designed to work with the Spark sessions in Jupyter notebooks and provides methods to access and manipulate data stored in the Microsoft Sentinel data lake.
This class is part of the sentinel.datalake
module and provides methods to interact with the data lake. To use this class, import it and create an instance of the class using the spark
session.
from sentinel_lake.providers import MicrosoftSentinelProvider
data_provider = MicrosoftSentinelProvider(spark)
You must have the necessary permissions to perform operations such as reading and writing data. For more information on permissions, see Microsoft Sentinel data lake permissions.
Methods
The MicrosoftSentinelProvider
class provides several methods to interact with the Microsoft Sentinel data lake.
Each method listed below assumes the MicrosoftSentinelProvider
class has been imported and an instance has been created using the spark
session as follows:
from sentinel_lake.providers import MicrosoftSentinelProvider
data_provider = MicrosoftSentinelProvider(spark)
list_databases
List all available databases / Microsoft Sentinel workspaces.
data_provider.list_databases()
Returns:
list[str]
: A list of database names (workspaces) available in the Microsoft Sentinel data lake.
list_tables
List all tables in a given database.
data_provider.list_tables([database],[id])
Parameters:
database
(str, optional): The name of the database (workspace) to list tables from. Default value:default
.id
(str, optional): The unique identifier of the database if workspace names aren't unique.
Returns:
list[str]
: A list of table names in the specified database.
Examples:
List all tables in the default
database:
data_provider.list_tables()
List all tables in a specific database. Specify the id
of the database if your workspace names aren't unique:
data_provider.list_tables("workspace1", id="ab1111112222ab333333")
read_table
Load a DataFrame from a table in Lake.
data_provider.read_table({table}, [database], [id])
Parameters:
table_name
(str): The name of the table to read.database
(str, optional): The name of the database (workspace) containing the table. Default value:default
.id
(str, optional): The unique identifier of the database if workspace names aren't unique.
Returns:
DataFrame
: A DataFrame containing the data from the specified table.
Example:
df = data_provider.read_table("EntraGroups", "default")
save_as_table
Write a DataFrame as a managed table. You can write to the lake tier by using the _SPRK
suffix in your table name, or to the analytics tier by using the _SPRK_CL
suffix.
data_provider.save_as_table({DataFrame}, {table_name}, [database], [id], [write_options])
Parameters:
DataFrame
(DataFrame): The DataFrame to write as a table.table_name
(str): The name of the table to create or overwrite.database
(str, optional): The name of the database (workspace) to save the table in. Default value:default
.id
(str, optional): The unique identifier of the database if workspace names aren't unique.write_options
(dict, optional): Options for writing the table. Supported options: - mode:append
oroverwrite
(default:append
) - partitionBy: list of columns to partition by Example: {'mode': 'append', 'partitionBy': ['date']}
Returns:
str
: The run ID of the write operation.
Note
The partitioning option only applies to custom tables in default database (workspace) in the data lake tier. It isn't supported for tables in the analytics tier or for tables in databases other than the default database in the data lake tier.
Examples:
Create new custom table in the data lake tier in the lakeworkspace
workspace.
data_provider.save_as_table(dataframe, "CustomTable1_SPRK", "lakeworkspace")
Append to a table in the default workspace in the data lake tier.
write_options = {
'mode': 'append'
}
data_provider.save_as_table(dataframe, "CustomTable1_SPRK", write_options=write_options)
Create new custom table in the analytics tier.
data_provider.save_as_table(dataframe, "CustomTable1_SPRK_CL", "analyticstierworkspace")
Append or overwrite to an existing custom table in the analytics tier.
write_options = {
'mode': 'append'
}
data_provider.save_as_table(dataframe, "CustomTable1_SPRK_CL", "analyticstierworkspace", write_options)
Append to the default database with partitioning on the TimeGenerated
column.
data_loader.save_as_table(dataframe, "table1", write_options: {'mode': 'append', 'partitionBy': ['TimeGenerated']})
delete_table
Deletes the table from the lake tier. You can delete table from lake tier by using the _SPRK
suffix in your table name. You can't delete a table from the analytics tier using this function. To delete a custom table in the analytics tier, use the Log Analytics API functions. For more information, see Add or delete tables and columns in Azure Monitor Logs.
data_provider.delete_table({table_name}, [database], [id])
Parameters:
table_name
(str): The name of the table to delete.database
(str, optional): The name of the database (workspace) containing the table. Default value:default
.id
(str, optional): The unique identifier of the database if workspace names aren't unique.
Returns:
dict
: A dictionary containing the result of the delete operation.
Example:
data_provider.delete_table("customtable_SPRK", "lakeworkspace")