Polyglot Notebooks: A practical introduction

Following the Jupyter model, the .NET-centered Polyglot notebooks combine Markdown and code. They allow multiple languages in the same notebook.

(Bild: jakkaje879/Shutterstock.com)

Apr 20, 2024 at 1:03 pm CEST

14 min. read

Developer

By

Dr. Eike M. Hirdes
Arthur Grot

This article was originally published in German and has been automatically translated.

Microsoft's Polyglot Notebooks are modelled after the interactive Jupyter Notebooks, but are .NET-centric. This article provides a practical introduction to Polyglot Notebooks and describes their basic concepts, such as setup, using different programming languages, sharing variables and displaying results, using an example. Limitations in use and further topics are also discussed.

What are Polyglot Notebooks?

Polyglot Notebooks allow users to enrich code with documentation (in Markdown format), output the results and visualize them. The code blocks can be executed individually directly from the notebook and even combined with each other. Unlike the Jupyter notebooks often used for Python, they are designed to run multiple languages within one notebook.

As a result, Polyglot notebooks offer a wide range of application possibilities in software development:

Arthur Grot ist Softwareentwickler bei AIT GmbH & Co. KG. Er hilft dabei die Visionen seiner Kunden mit .NET- und Azure-Technologien in innovative Cloud- und IoT-Softwarelösungen umzusetzen.

Dr. Eike M. Hirdes ist Azure DevOps Enabler bei der AIT GmbH & Co. KG. Er berät Unternehmen in den Bereichen agile Prozesse, Automatisierung und Administration, hauptsächlich im Bereich Azure DevOps. Als Autor der Blogserien AIT Tech Talk und Azure DevOps Nugget veröffentlicht er regelmäßig Artikel auf dem AIT Blog.

Setting up Polyglot in VS Code

To use Polyglot Notebooks, the extension must first be installed in the source code editor Visual Studio Code (VS Code). This requires at least the .NET SDK 7.0 and the Polyglot Notebooks extension from the Visual Studio Marketplace. The extension also automatically installs the required Jupyter extensions.

After installing the required dependencies, a notebook can be created using the command palette in VS Code (Ctrl+Shift+P) and the command "Polyglot Notebook: Create default notebook" command to create a notebook. In this example, the .dib format is selected first. An explanation of the different file formats with their advantages and disadvantages will follow later. After selecting the format, a notebook opens with ".NET Interactive" as the selected kernel (Figure 1). The kernel is a service running in the background that executes the code from the notebook and returns the result as output. Each language has its own subkernel, which is responsible for executing the respective language.

Neu erstelltes Polyglot Notebook in VS Code (Abb. 1) — Newly created Polyglot notebook in VS Code (Fig. 1)

Overview of Polyglot

The header bar not only displays the kernel, but also makes it possible to create additional code and markdown cells. A cell is a structural unit in the notebook that contains either code of a specific language or Markdown. This makes it possible to write code and Markdown documentation alternately (Figure 2).

Polyglot-Notebook-Übersicht (Abb. 2) — Polyglot notebook overview (Fig. 2)

The "Run All" command executes all cells in the notebook one after the other and displays the output. If problems occur during execution and the kernel gets into an incorrect state due to an error, it can be restarted. In the "Variables" menu, you can activate the view with which all variables in the kernel can be viewed. Each code cell can also be executed individually via the "Play" symbol (▷).

Polyglot's multi-language support currently includes the following programming and markup languages: C#, F#, PowerShell, JavaScript, SQL, KQL, HTML, Mermaid, Python and R. VS Code offers syntax highlighting, code autocompletion and other features for the languages. Since December 2023, Polyglot has also supported the HTTP protocol, which means that HTTP requests can be sent directly from notebooks and the responses displayed.

Polyglot Notebooks explained using an example

To demonstrate the features of Polyglot Notebooks, an example is used that is created from scratch. The initial situation is to quickly find out which customers have bought a certain product for a recall campaign. This data is stored in an SQL database. The SQL connection string required for this is to be set up in the first step and the access data queried for this. In the next step, a connection is established with the database and then the required data is retrieved. In the last step, the data is displayed as a diagram to quickly recognize which customer is affected and to what extent.

The example shows how different languages and technologies (Polyglot Magic Commands, C#, SQL and Mermaid, which creates graphics from Markdown) can be used in a notebook and how they interact with each other. Magic Commands are special commands for Polyglot notebooks that make it possible to execute code in other languages within a code cell or to control the notebook environment, for example to set variables.

Creating the SQL connection string - processing user input and variables

Once Polyglot has been set up, the connection string for the SQL database is created. This requires four parameters: the URL of the SQL server, the name of the database, a username and the password. The first two parameters are fixed, the last two should be entered by the user.

Markdown can be used to explain both the entire notebook and individual code cells. For this connection string, variables should be permanently defined, but users should also be asked for access data. The following cell in Markdown format is used for documentation and instructions:

## Create SQL Connection String
Run the following cell to create a sql connection string.
You will be prompted to input username and password.

Next, create a C# cell to permanently define the URL of the SQL server (serverUrl) and the name of the database (databaseName) as variables. The username (username) and the password (password) are to be queried. The implementation for the query with Polyglot Magic Commands can be found in Listing 1. After calling the Magic Commands in the C# cell, the created variables can be used in the C# kernel.

// Fetch user input with magic commands
var serverUrl = "polyglot.database.windows.net";
var databaseName = " polyglot";

#!set --value @input:"Please provide a username" --name username
#!set --value @password:"Please provide a password" --name password

Listing 1: Querying user data with Magic Commands in a C# cell

If you prefer to implement this completely in C#, you must first integrate the Microsoft.DotNet.Interactive library, here as a NuGet package. The C# alternative can be seen in Listing 2.

// Fetch user input with C#
using Microsoft.DotNet.Interactive;

var serverUrl = "polyglot.database.windows.net";
var databaseName = "polyglot";

var username = await Kernel.GetInputAsync("Please provide a username");
var password = await Kernel.GetPasswordAsync("Please provide a password");

LIsting 2: Querying user data in a C# cell

Both types of input support "Input" and "Password". With "Input" the input is displayed in plain text, whereas with "Password" it is masked. Note: Only the password input is masked, the content is visible in the variable view of Polyglot (Figure 3).

Achtung! Abgefragte Passwörter sind in der Variablenansicht sichtbar (Abb. 3). — Attention! Requested passwords are visible in the variable view (Fig. 3).

The SQL connection string is composed of the previously defined variables:

string connectionString = $"Server={serverUrl};Database={databaseName}; User
Id='{username}';Password='{password}';Persist Security Info=true;Integrated Security=false;";

Connecting to a Microsoft SQL Server database

The sample data in SQL is provided by Microsoft's AdventureWorks training database, which can be provided both on your own SQL server and directly as an Azure SQL Database. Microsoft provides instructions.

Please note that Integrated Security=false was set in the connection string used above, as the example uses SQL Server Authentication. If Windows Authentication is used, this must be adapted.

To use SQL, the Microsoft.DotNet.Interactive.SqlServer library must first be installed. As it is an external package, it must be referenced using the #r syntax (see also Referencing external assemblies):

// magic commands for SQL Server connection
#r "nuget: Microsoft.DotNet.Interactive.SqlServer, *-*"

Only then can the SQL connection string be used to create a Polyglot SQL subkernel. This is done with the magic command #!connect. To be able to use the C# variable connectionString, the name of the subkernel must be specified (here @csharp, see next listing). The topic of variable sharing is covered in more detail in the next sections.

The next step is to create an SQL subkernel with the SQL connection string in a C# cell:

#!connect mssql --kernel-name DemoKernel @csharp:connectionString

Users of the notebook can now either use the default productId or have it queried. This is how the standard productId is defined in an SQL cell:

-- use default productId
#!set --name productId --value "988"
Sie lässt sich wie folgt abfragen:
-- user defined productId
#!set --name productId --value @input:productId

Next, the SQL query must be created. It joins the two tables SalesOrderHeader and SalesOrderDetail to obtain the customer number (CustomderId) and number of products purchased per order (OrderQty). By saving the result in a variable, it can be used outside the query. In this example (Listing 3), it is the variable QueryResult.

#!sql-DemoKernel --name QueryResult
SELECT CustomerID, SalesOrderHeader.SalesOrderID, ProductID, OrderQty
  FROM SalesLT.SalesOrderHeader JOIN SalesLT.SalesOrderDetail
          ON SalesLT.SalesOrderDetail.SalesOrderID = SalesLT.SalesOrderHeader.SalesOrderID
WHERE ProductID = @productId
ORDER BY OrderQty desc

Listing 3: Calling the SQL query from Polyglot in an SQL cell

Immediately after executing the SQL cell, the response appears in the Notebook (see Figure 4).

Ausgabe des SQL-Befehls in Listing 3 (Abb. 4) — Output of the SQL command in Listing 3 (Fig. 4)

Exchanging variables between languages

After executing the SQL query, the output should be used to generate a graphic. First, however, the variable from the SQL subkernel must be split into the C# subkernel. Polyglot Notebooks uses the concept of variable sharing for this purpose. This allows variables to be exchanged between almost all supported languages and technologies, except Mermaid and HTML (see Table 1).

Language	Variable sharing
C#	✅
F#	✅
PowerShell	✅
JavaScript	✅
SQL	✅
KQL (Kusto Query Language)	✅
Python	✅
R	✅
HTML	⛔
HTTP	✅
Mermaid	⛔

Table 1: Currently supported languages with a note on whether variable sharing is possible

(Source: Microsoft)

In this example, it was necessary to save the output of the SQL query in a variable QueryResult (see first line in Listing 3). Other languages work natively with variables. These are usually already in the working memory and can be shared without an additional command. Which variables can be found in which subkernel can be looked up in the previously mentioned variable view (see Figure 2).

In both cases, the next step is to create a new cell in a different language. The cell starts with the magic command #!share, which receives the information as --from parameter, from which subkernel which variable is to be shared into the current subkernel. This example is intended to use both the query output (QueryResult) and the product ID (productId) in the C# subkernel. After executing the following cell, both values are available as C# variables. The source subkernel and the variable name are passed as parameters:

#!share --from sql-DemoKernel QueryResult
#!share --from sql-DemoKernel productId

Creating dynamic diagrams with Mermaid

The JavaScript-based tool Mermaid can be used to generate images in Polyglot Notebooks. However, Mermaid does not support variables. Therefore, at this point the Mermaid code is generated dynamically using C#, based on the output of the SQL query. The generated Mermaid code can call a command in the Mermaid subkernel, which then generates the graphic.

The required variables have already been split from the SQL subkernel at this point. The product ID can be used to generate the basis for the pie chart including the heading in a string (see Listing 4).

All entries from the result of the query are now appended to this base one after the other in a loop to then execute asynchronous code on the Mermaid subkernel with the Kernel.Root.SendAsync() command.

#!share --from sql-DemoKernel QueryResult
#!share --from sql-DemoKernel productId

var mermaidCodeStringBuilder = new StringBuilder();
mermaidCodeStringBuilder.AppendLine($"""
pie showData
    title Customers who bought product '{productId}'

"""
);

foreach (var element in QueryResult[0].Data){
    var customerId = element.FirstOrDefault(x => x.Key == "CustomerID").Value ?? "Unknown";
    var orderQty = element.FirstOrDefault(x => x.Key == "OrderQty").Value ?? "0";
    mermaidCodeStringBuilder.AppendLine($"\"{customerId}\": {orderQty}");
}

await Kernel.Root.SendAsync(new SubmitCode(mermaidCodeStringBuilder.ToString(), "mermaid"));

Listing 4: Generating Mermaid code and calling the Mermaid subkernel in a C# cell

The pie chart generated by Mermaid in Figure 5 shows which customer has purchased product 988 how often and is therefore affected by the recall.

Generierte Abbildung mit Mermaid. Darstellung in einem Tortendiagramm, welche Kunden-IDs wie oft das Produkt 988 gekauft haben (Abb. 5). — Generated image with Mermaid. Display in a pie chart which customer IDs have purchased product 988 and how often (Fig. 5).

Comparison of file formats

The two file formats .ipynb and .dib are available for Polyglot notebooks. The .ipynb is the classic Jupyter notebook format, which saves the output of the code in the file. Output here means the output of executed code, which may contain sensitive data, such as secrets or personal or company-related data, and makes it difficult to check notebooks into source code management systems. It must therefore be ensured that the output is deleted. The format ensures compatibility with the Jupyter format and makes it possible to edit notebooks in other applications, such as JupyterLab.

The .dib format, on the other hand, saves the code in its own structure, which is incompatible with Jupyter products. However, the advantage of this format is that the notebook does not contain the output of the executed code, which makes it easier to work with sensitive data such as secrets and a code management system such as Git. In addition, the structure is flatter than the JSON structure of .ipynb files, making it easier to track code changes during the code review.

If compatibility with the Jupyter format is not required, the .dib format should be used due to the advantages described.

Stumbling blocks and limitations

Unfortunately, there are a few stumbling blocks and limitations that were noticed when creating the example used here. Firstly, there is the aforementioned problem with the .ipynb file format, which is not specific to Polyglot notebooks. The alternative format from Poylglot Notebooks (.dib) is a suitable solution. In addition, the official documentation requires improvement in various places, has gaps and delays the actual goal of implementation. Since Mermaid and HTML do not support variables, the dynamic generation of diagrams or HTML is cumbersome.

A bigger problem, however, is that the Polyglot notebook kernel can enter a faulty state under unknown circumstances and require a kernel restart. If IDE features such as code completion or syntax and error highlighting no longer work, this can be remedied by temporarily selecting a different code cell and then switching back. If necessary, a complete restart of VS Code is required.

Conclusion and outlook

Despite the stumbling blocks, the Polyglot notebooks are a very useful tool. They offer the possibility of quickly combining different languages with each other and outputting the results of the execution, as well as displaying them in tables or mermaid diagrams. This makes them an alternative to LINQPad and RoslynPad for quickly creating prototypes and proof-of-concepts. In Markdown format, the solutions can be documented directly for non-developers and interactive reports can be generated.

Beyond the scope of this article, there are many other exciting topics that are worth looking at. Among other things, Entity Framework can be used to create C# models of databases, which can then be accessed from Polyglot Notebooks. Furthermore, you can refer to other notebooks directly from a Polyglot notebook and start programs from them, so that it is not necessary to describe everything in a single notebook. Thanks to the integrated Mermaid diagrams, the notebooks are also suitable for technical documentation, for example via class and sequence diagrams. Further suggestions can be found in the examples in the Polyglot Notebooks GitHub repository.

(mki)

nach oben

Alle Angebote

Newsletter heise-Bot

${intro} ${title}