This article is contributed. See the original author and article here.
Announcing SynapseML v0.11. The new version contains many new features to help you build scalable machine learning pipelines.
We are pleased to announce SynapseML v0.11, a new version of our open-source distributed machine learning library that simplifies and accelerates the development of scalable AI. In this release, we are excited to introduce many new features from the past year of developments well as many bug fixes and improvements. Though this post will give a high-level overview of the most salient new additions, curious readers can check out the full release notes for all of the new additions.
OpenAI Language Models and Embeddings
A new release wouldn’t be complete without joining the large language model (LLM) hype train and SynapseML v0.11 features a variety of new features that make large-scale LLM usage simple and easy. In particular, SynapseML v0.11 introduces three new APIs for working with foundation models: `OpenAIPrompt`, ` OpenAIEmbedding`, and `OpenAIChatCompletion`. The `OpenAIPrompt` API makes it easy to construct complex LLM prompts from columns of your dataframe. Here’s a quick example of translating a dataframe column called “Description” into emojis.
from synapse.ml.cognitive.openai import OpenAIPrompt
emoji_template = """
Translate the following into emojis
Word: {Description}
Emoji: """
results = (OpenAIPrompt()
.setPromptTemplate(emoji_template)
.setErrorCol("error")
.setOutputCol("Emoji")
.transform(inputs))
This code will automatically look for a database column called “Description” and prompt your LLM (ChatGPT, GPT-3, GPT-4) with the created prompts. Our new OpenAI embedding classes make it easy to embed large tables of sentences quickly and easily from your Apache Spark clusters. To learn more, see our docs on using OpenAI embeddings API and the SynapseML KNN model to create an LLM-based vector search engine directly on your spark cluster. Finally, the new OpenAIChatCompletion transformer allows users to submit large quantities of chat-based prompts to ChatGPT, enabling parallel inference of thousands of conversations at a time. We hope you find the new OpenAI integrations useful for building your next intelligent application.
Simple Deep Learning
SynapseML v0.11 introduces a new Simple deep learning package that allows for the training of custom text and deep vision classifiers with only a few lines of code. This package integrates the power of distributed deep network training with PytorchLightning with the simple and easy APIs of SynapseML. The new API allows users to fine-tune visual foundation models from torchvision as well as a variety of state-of-the-art text backbones from HuggingFace.
Here’s a quick example showing how to fine-tune custom vision networks:
Keep an eye out with upcoming new releases of SynapseML featuring additional simple deep-learning algorithms that will make it easier than ever to train and deploy models at scale.
LightGBM v2
LightGBM is one of the most used features of SynapseML and we heard your feedback on better performance! SynapseML v0.11 introduces a completely refactored integration between LightGBM and Spark, called LightGBM v2. This integration aims for high performance by introducing a variety of new streaming APIs in the core LightGBM library to enable fast and memory-efficient data sharing between spark and LightGBM. In particular, the new “Streaming execution mode” has a >10x lower memory footprint than earlier versions of SynapseML yielding fewer memory issues and faster training. Best of all, you can use the new mode by just passing a single extra flag to your existing LightGBM models in SynapseML.
ONNX Model Hub
SynapseML supports a variety of new deep learning integrations with the ONNX runtime for fast, hardware-accelerated inference in all of the SynapseML languages (Scala, Java, Python, R, and .NET). In version 0.11 we add support for the new ONNX model hub, which is an open collection of state-of-the-art pre-trained ONNX models that can be quickly downloaded and embedded into spark pipelines. This allowed us to completely deprecate and remove our old dependence on the CNTK deep learning library.
To learn more about how you can embed deep networks into Spark pipelines, check out our ONNX episode in the new SynapseML video series:
Causal Learning
SynapseML v0.11 introduces a new package for causal learning that can help businesses and policymakers make more informed decisions. When trying to understand the impact of a “treatment” or intervention on an outcome, traditional approaches like correlation analysis or prediction models fall short as they do not necessarily establish causation. Causal inference aims to overcome these shortcomings by bridging the gap between prediction and decision-making. SynapseML’s causal learning package implements a technique called “Double machine learning”, which allows us to estimate treatment effects without data from controlled experiments. Unlike regression-based approaches, this approach can model non-linear relationships between confounders, treatment, and outcome. Users can run the DoubleMLEstimator using a simple code snippet like the one below:
from pyspark.ml.classification import LogisticRegression
from synapse.ml.causal import DoubleMLEstimator
dml = (DoubleMLEstimator()
.setTreatmentCol("Treatment")
.setTreatmentModel(LogisticRegression())
.setOutcomeCol("Outcome")
.setOutcomeModel(LogisticRegression())
.setMaxIter(20))
dmlModel = dml.fit(dataset)
For more information, be sure to check out Dylan Wang’s guided tour of the DoubleMLEstimator on the SynapseML video series:
Vowpal Wabbit v2
Finally, SynapseML v0.11 introduces Vowpal Wabbit v2, the second-generation integration between the Vowpal Wabbit (VW) online optimization library and Apache Spark. With this update, users can work with Vowpal wabbit data directly using the new “VowpalWabbitGeneric” model. This makes working with Spark easier for existing VW users. This more direct integration also adds support for new cost functions and use cases including “multi-class” and “cost-sensitive one against all” problems. The update also introduces a new progressive validation strategy and a new Contextual Bandit Offline policy evaluation notebook to demonstrate how to evaluate VW models on large datasets.
Conclusion
In conclusion, we are thrilled to share the new SynapseML library with you with you and hope you will find that it simplifies your distributed machine learning pipelines. This blog only covered the highlights, so be sure to check out the full release notes for all the updates and new features. Whether you are working with large language models, training custom classifiers, or performing causal inference, SynapseML makes it easier and faster to develop and deploy machine learning models at scale.
This article is contributed. See the original author and article here.
Como criar uma extensão customizada para o Azure DevOps
Em alguns casos, é necessário criar uma extensão personalizada para o Azure DevOps, seja para adicionar funcionalidades que não estão disponíveis nativamente ou para modificar alguma funcionalidade existente que não atenda às necessidades do projeto. Neste artigo, mostraremos como criar uma extensão personalizada para o Azure DevOps e como publicá-la no Marketplace do Azure DevOps.
Antes de começar certifique:
Ter uma conta no Azure DevOps. Caso ainda não tenha uma, você pode criar uma seguindo as instruções disponíveis aqui.
Ter um editor de código instalado, como o Visual Studio Code, que pode ser baixado em code.visualstudio.com.
Ter a versão LTS do Node.js instalada, disponível para download em, nodejs.org. Ter o compilador de TypeScript instalado, sendo a versão recomendada 4.0.2 ou superior. Ele pode ser instalado via npm em npmjs.com.
Ter o CLI do TFX instalado, sendo a versão recomendada 0.14.0 ou superior. Ele pode ser instalado globalmente via npm com o comando npm i -g tfx-cli ou conferindo mais detalhes em TFX-CLInpm i -g tfx-cli.
Preparando o ambiente de desenvolvimento
Crie uma pasta para a extensão, por exemplo, my-extension e dentro desta pasta crie a uma subpasta, por exemplo, task.
Abra o terminal na pasta criada e execute o comando npm init -y, o parâmetro -y é para aceitar todas as opções padrão. Você vai notar que foi criado um arquivo chamado package.json e nele estão as informações da extensão.
Substitua o <> por ID único de cada extensão, você pode gerar um ID aqui. Substitua o <> pelo publisher ID criado no passo 1 da etapa de publish.
Na pasta raiz da sua extensão my-extension, crie uma pasta chamada images e adicione uma imagem chamada icon.png com o tamanho de 128×128 pixels. Essa imagem será usada como ícone da sua extensão no Marketplace.
Criando a extensão
Depois de configurar o ambiente, você pode criar a extensão.
Na pasta task crie um arquivo chamado task.json e adicione o seguinte conteúdo:
Substitua o <> pelo mesmo GUID gerado no passo 8 da etapa de preparação de ambiente de desenvolvimento.
Esse arquivo descreve a extensão que será executada no pipeline. Nesse caso, a extensão ainda não faz nada, mas você pode adicionar os inputs e a lógica para executar qualquer coisa.
Na sequência crie um arquivo chamado index.js e adicione o seguinte conteúdo:
Esse arquivo é o responsável por executar a extensão. Nesse caso, ele apenas retorna uma mensagem de sucesso. Você pode adicionar a lógica para executar qualquer coisa.
Adicione na pasta task uma imagem chamada icon.png com o tamanho de 32×32 pixels. Essa imagem será usada como ícone da sua extensão no Azure Pipelines.
No terminal, execute o comando tsc, para compilar o código Typescript para Javascript. Esse comando irá gerar um arquivo chamado index.js na pasta task.
Para executar a extensão localmente, execute o comando node index.js. Você deve ver a mensagem My Extension Succeeded!.
Quando a sua extensão estiver pronta, você pode publicá-la no Marketplace. Para isso será necessário criar um editor de extensão no Marketplace.
Acesse o Marketplace e clique em Publish Extension. Após fazer o login, você será redirecionado para a página de criação de um editor de extensão. Preencha os campos e clique em Create.
Imagem 1 – Criando um editor de extensão no Portal no Marketplace
No terminal execute o comando tfx extension create –manifest-globs vss-extension.json, na pasta My-Extension. Esse comando irá gerar um arquivo chamado publishID-1.0.0.vsix, que é o arquivo que será publicado no Marketplace.
Imagem 2 – Tela de linha de comando TFX CLI, criando uma extensão
Acesse a página de publicação de extensão no Marketplace e clique New extension e seguida Azure DevOps. Selecione o arquivo my-extension-1.0.0.vsix e clique em Upload.
Imagem 3 – Tela para fazer o Upload de uma Extensão para o Marketplace
Se tudo ocorrer bem, você verá algo como a imagem abaixo.
Imagem 4 – Exemplo de uma extensão publicada no Marketplace
Com a extensão publicada, será necessário compartilhá-la com a sua organização. Para isso, clique no menu de contexto da extensão e clique em Share/UnShare.
Imagem 5 – Menu de opções de uma extensão, com a opção de compartilhar destacada
Clique em + Organization.
Imagem 6 – Tela para compartilhar um extensão
E digite o nome da sua Organização, ao clicar fora da caixa de digitação a validação é feita e o compartilhamento é realizado.
Imagem 7 – Exemplo de como é possível compartilhar uma extensão com uma organização
Instalando a extensão na sua organização
Após publicar a extensão no Marketplace, você pode instalá-la na sua organização, para isso siga os passos abaixo.
Clique no menu de contexto da extensão e clique em View Extension.
Imagem 8 – Menu de uma extensão, opção de visualizar destacada
Você verá algo como a imagem abaixo.
Imagem 9 – Tela da extensão no Marketplace
Clique em Get it free.
Verifique se sua organização está selecionada e clique em Install.
Imagem 10 – Tela de instalação da extensão no marketplace
Se a instalação ocorrer tudo bem, você verá algo como a imagem abaixo.
Imagem 11 – Tela de confirmação de instalação da extensão
Após a instalação, você verá a extensão na lista de extensões instaladas e poderá ser utilizada nos seus pipelines.
Conclusão
O uso de extensões customizadas no Azure DevOps desbloqueiam funcionalidades que não estão disponíveis. Neste artigo, você aprendeu como criar uma extensão customizada e como publicá-la no Marketplace. Espero que tenha gostado e que possa aplicar o conhecimento adquirido em seus projetos.
This article is contributed. See the original author and article here.
Documents can contain table data. For example, earning reports, purchase order forms, technical and operational manuals, etc., contain critical data in tables. You may need to extract this table data into Excel for various scenarios.
Extract each table into a specific worksheet in Excel.
Extract the data from all the similar tables and aggregate that data into a single table.
Here, we present two ways to generate Excel from a document’s table data:
Azure Function (HTTP Trigger based): This function takes a document and generates an Excel file with the table data in the document.
Apache Spark in Azure Synapse Analytics (in case you need to process large volumes of documents).
The Azure function extracts table data from the document using Form Recognizer’s “General Document” model and generates an Excel file with all the extracted tables. The following is the expected behavior:
Each table on a page gets extracted and stored to a sheet in the Excel document. The sheet name corresponds to the page number in the document.
Sometimes, there are key-value pairs on the page that need to be captured in the table. If you need that feature, leverage the add_key_value_pairs flag in the function.
Form Recognizer extracts column and row spans, and we take advantage of this to present the data as it is represented in the actual table.
Following are two sample extractions.
Top excel is with key value pairs added to the table. Bottom one is without the key value pairs.
The Excel shown above is the extraction of table data from an earnings report. The earnings report file had multiple pages with tables, and the fourth page had two tables.
Solution
Azure Function and Synapse Spark Notebook is available here in this GIT Repository
This article is contributed. See the original author and article here.
Disclaimer
This document is not meant to replace any official documentation, including those found at docs.microsoft.com. Those documents are continually updated and maintained by Microsoft Corporation. If there is a discrepancy between this document and what you find in the Compliance User Interface (UI) or inside of a reference in docs.microsoft.com, you should always defer to that official documentation and contact your Microsoft Account team as needed. Links to the docs.microsoft.com data will be referenced both in the document steps as well as in the appendix.
All the following steps should be done with test data, and where possible, testing should be performed in a test environment. Testing should never be performed against production data.
Target Audience
Microsoft customers who want to better understand Microsoft Purview.
Document Scope
The purpose of this document (and series) is to provide insights into various user cases, announcements, customer driven questions, etc.
Topics for this blog entry
Here are the topics covered in this issue of the blog:
Sensitivity Labels relating to SharePoint Lists
Sensitivity Label Encryption versus other types of Microsoft tenant encryption
How Sensitivity Labels conflicts are resolved
How to apply Sensitivity Labels to existing SharePoint Sites
Where can I find information on how Sensitivity Labels are applied to data within a SharePoint site (i.e. File label inheritance from the Site label)
Out-of-Scope
This blog series and entry is only meant to provide information, but for your specific use cases or needs, it is recommended that you contact your Microsoft Account Team to find other possible solutions to your needs.
Sensitivity labels and SharePoint Sites – Assorted topics
Encryption Sensitivity Label Encryption versus other types of Microsoft tenant encryption
Question #1
How does the encryption of Sensitivity Labels compare to encryption in leveraged in BitLocker?
Answer #1
The following table breaks this down in detail and is taken from the following Microsoft Link.
Can you apply Sensitivity Labels to SharePoint Lists?
Answer #2
The simple answer is NO while in the list, but YES once the list is exported to a file format.
Data in the SharePoint List is stored within a SQL table in SharePoint. At the time of the writing of this blog, you cannot apply a Sensitivity Label to a SharePoint Online tables, including SharePoint Lists.
SharePoint Lists allow for exports of the data in the list to a file format. An automatic sensitivity label policy can apply a label to those file formats. Here is an (example below of those export options.
How to apply Sensitivity Labels to existing SharePoint Sites
Question #3
Can you apply Sensitivity Labels to existing SHPT sites? If so, is this, can this be automated (ex. PowerShell)
Answer #3
You can leverage PowerShell to apply SharePoint labels to multiple sites. Here is the link that explains how to accomplish this.
Look for these two sections in the link below for details:
Use PowerShell to apply a sensitivity label to multiple sites
View and manage sensitivity labels in the SharePoint admin center
If you have an existing file with an existing Sensitivity Label that is stricter than the Sensitivity Label being inherited from SharePoint Site label, which Sensitivity Label is applied to the file?
Answer #4
Please refer to the link and table below for how Sensitivity Label conflicts are handled. Notice that any Higher priority label or user applied label, would not be overridden by a site label or an automatic labeling policy.
“When SharePoint is enabled for sensitivity labels, you can configure a default label for document libraries. Then, any new files uploaded to that library, or existing files edited in the library will have that label applied if they don’t already have a sensitivity label, or they have a sensitivity label but with lower priority.
For example, you configure the Confidential label as the default sensitivity label for a document library. A user who has General as their policy default label saves a new file in that library. SharePoint will label this file as Confidential because of that label’s higher priority.”
Recent Comments