Azure AD Application Proxy now supports the Remote Desktop Services web client

Azure AD Application Proxy now supports the Remote Desktop Services web client

This article is contributed. See the original author and article here.

Howdy folks!

 

Today we’re announcing the public preview of Azure AD Application Proxy (App Proxy) support for the Remote Desktop Services (RDS) web client. Many of you are already using App Proxy for applications hosted on RDS and we’ve seen a lot of requests for extending support to the RDS web client as well.

 

With this preview, you can now use the RDS web client even when App Proxy provides secure remote access to RDS. The web client works on any HTML5-capable browser such as Microsoft Edge, Internet Explorer 11, Google Chrome, Safari, or Mozilla Firefox (v55.0 and later). You can push full desktops or remote apps to the Remote Desktop web client. The remote apps are hosted on the virtualized machine but appear as if they’re running on the user’s desktop like local applications. The apps also have their own taskbar entry and can be resized and moved across monitors.

 

Launch rich client apps with a full desktop like experienceLaunch rich client apps with a full desktop like experience

 

Why use App Proxy with RDS?

RDS allows you to extend virtual desktops and applications to any device while helping keep critical intellectual property secure. By using this virtualization platform, you can deploy all types of applications such as Windows apps and other rich client apps as-is with no re-writing required. By using App Proxy with RDS you can reduce the attack surface of your RDS deployment by enforcing pre-authentication and Conditional Access policies like requiring Multi-Factor Authentication (MFA) or using a compliant device before users can access RDS. App Proxy also doesn’t require you to open inbound connections through your firewall.

 

Getting started

To use the RDS web client with App Proxy, first make sure to update your App Proxy connectors to the latest version, 1.5.1975.0. If you haven’t already, you will need to configure RDS to work with App Proxy. In this configuration, App Proxy will handle the internet facing component of your RDS deployment and protect all traffic with pre-authentication and any Conditional Access policies in place. For steps on how to do this, see Publish Remote Desktop with Azure AD Application Proxy.

 

How Azure AD App Proxy works in an RDS deploymentHow Azure AD App Proxy works in an RDS deployment

 

Configure the Remote Desktop web client

Next, complete setup by enabling the Remote Desktop web client for user access. See details on how to do this at Set up the Remote Desktop web client for your users. Now your users can use the external URL to access the client from their browser, or they can launch the app from the My Apps portal.

 

As always, we’d love to hear any feedback or suggestions you may have. Please let us know what you think in the comments below or on the Azure AD feedback forum.

 

Best regards,

Alex Simons (@alex_a_simons)

Corporate Vice President Program Management

Microsoft Identity Division

 

Learn more about Microsoft identity:

Multi-cloud Data Virtualisation on Azure

Multi-cloud Data Virtualisation on Azure

This article is contributed. See the original author and article here.

Many organisations have established their data-lake on Azure to manage their end to end data analytics estate. In some cases, organisations’ customers/partners leverage other cloud providers and we want to meet them wherever they are, after all Azure is an open and versatile platform.

 

If the partner or customer is already using Azure, there are a myriad of options to move data into their estate. Azure Data Share however stands out as its geared towards these cross-tenant data sharing use-cases. It allows users to create invitations, define T&Cs, define snapshot frequency and type (incremental/full), and revoke shares.

 

Pulling data into Azure from other clouds is also rather straight-forward using one of Azure Data Factory’s 90+ copy-activity connectors, including AWS, GCP, Salesforce, Oracle and many more.

Some of these connectors support being used as a source (read) and sink (write). Azure native services, Oracle, SAP, and some others can be used as source and sink. However, not all connectors support this, in which case developers can default to the generic connectors such as ODBC, filesystem, and SFTP connectors.

 

In this blog I want to outline another approach using spark to read and write selected datasets to other clouds such as GCS or S3. However, this methodology applies to really any service that has a spark or Hadoop driver. This gives us bidirectional on-demand access to any cloud storage. As data is read into memory we can join, filter, aggregate data as needed from multiple environments.

Caveat emptor, as data egresses you may be subject to network costs.

 

Pre-Reqs

  1. Azure subscription, Azure Databricks (ADB) workspace, and Azure Data Factory
  2. Google Cloud Platform subscription

 

Google Cloud

  1. Create a service account https://console.cloud.google.com/iam-admin/serviceaccounts/ > ad/nt > create key with type json > keep the json doc safe
  2. Go to https://console.cloud.google.com/storage/ > create bucket (single region, standard storage) > add “Storage Object Admin” permission to service account created in step 1. > upload a test csv file

 

Azure

  1. Navigate to your Azure Databricks workspace (or create one via Quickstart Guide)

     

  2. Upload your GCS service account json to DBFS storage, you can use the Databricks CLI
    databricks fs cp ./myspecialkey.json "dbfs:/data"

     

  3. Create a cluster using the 6.5 Databricks runtime (includes Apache Spark 2.4.5, Scala 2.11) with the following Spark Config:
    spark.hadoop.fs.gs.auth.service.account.json.keyfile /dbfs/data/myspecialkey.json 
    spark.hadoop.fs.gs.impl com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
    spark.hadoop.fs.gs.project.id {Your-GCP-Project-ID} 
    spark.hadoop.fs.gs.auth.service.account.enable true
    spark.databricks.pyspark.trustedFilesystems com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem,com.databricks.adl.AdlFileSystem,com.databricks.s3a.S3AFileSystem,shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem,shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem

    Note: the spark.databricks.pyspark.trustedFilesystems is needed to work around org.apache.spark.api.python.PythonSecurityException exception.

     

  4. Once the cluster is created add Google’s GCS connector as a library
    • Clusters > {your cluster} > Libraries > Install New > Maven > Coordinates: com.google.cloud.bigdataoss:gcs-connector:1.5.2-hadoop2
    • Make note of the version, other versions cause inflection and IOException errors as Databricks uses Hadoop 2.x. Later versions of Databricks runtimes (7.1+) may move to Hadoop 3.

     

  5. At this point we can simply use ls, cp, mv, rm to move data across between ADLS storage and GCS. Note: I set up credential pass-through in my cluster which authenticated me to my data-lake using my SSO/Active Directory credentials.
    dbutils.fs.cp("abfss://{filesystem}@{yourADLSaccount}.dfs.core.windows.net/movies/","gs://{yourGCSbucket}/",True)​

    ls_gs.png 

  6. We can also read individual files/folders into a data-frame.
    df = spark.read.csv("gs://{yourGCSbucket}/{somefile} ", header=True)​

    This will read data directly from your GCS bucket, note this may incur GCP egress costs.

     2_read_gs_df.png 

  7. From here the cloud is the limit, we can define spark tables across multiple environments and query across them as if it were all co-located. Don’t forget however, that spark has to import all the data into memory which may impact performance and egress costs.
    %sql
    CREATE TABLE adlsstest (movie long, title string, genres string, year long, rating long, RottonTomato string) USING CSV LOCATION "abfss://{filesystem}@{yourADLSaccount}.dfs.core.windows.net/movies/moviesDB.csv";
    CREATE TABLE gstest (movie long, title string, genres string, year long, rating long, RottonTomato string) USING CSV LOCATION "gs://{yourGCSbucket}/moviesDB.csv";
    CREATE VIEW myview AS
      SELECT *,'adls' AS src FROM adlsstest
      UNION ALL
      SELECT *,'gcs' as src FROM gstest

jomallin_2-1596027074532.png

 

Automation

From here it is a piece of cake to parameterise and automate movement of data. We can set up an Azure Data Factory pipeline to pass parameters into the Azure Databricks notebooks to do stuff. In this example I copy all files from a specified ADLS directory into a GCS target directory.

 

  1. Create a new notebook in Databricks using the code at the end
  2. Navigate to your Azure Data Factory (or create one via Quickstart Guide)
  3. Create a pipeline with a Databricks activity (here’s a guide)
  4. In the Data Factory pipeline create 3 parameters: sourcedir, targetdir, and myfile. Define some default values using the respective ABFSS and GS formats: 4_adf_integration.png

     

  5. Add the notebook path and map the pipeline parameters to the notebook parameters under Azure Databricks activity > Settings5_adf_parameters.png

     

     

  6. Click Debug, you can optionally modify the parameter values > Click Ok > As its running click on details (spectacles icon) and view notebook output 6_adf_run_debug.png

     

  7. Head back to your GCP portal, and you will see the files appear in GCS7_gcp_gs_directory.png

As we are accessing ADLS from an automated job we cannot use credential passthrough. My colleague Nicholas Hurt wrote a great piece discussing different approaches to authenticating to ADLS from ADB.

I am using a service principal for this demo. Azure documentation outlines how to set this up. Also, I set up a secret scope using the Databricks CLI and stored the service principal key there.

 

 

 

 

 

# authenticate using a service principal and OAuth 2.0
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "YOUR-APPLICATION-ID")
spark.conf.set("fs.azure.account.oauth2.client.secret", dbutils.secrets.get(scope = "mykeys", key = "mysp"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/YOUR-TENANT-ID/oauth2/token")

# readdf=spark.read.format("csv").option("header", "true").load("abfss://fs1@.dfs.core.windows.net/movies/moviesDB.csv")
dbutils.widgets.text("sourcedir", "","")
sourcedir = dbutils.widgets.get("sourcedir")
print ("Param -'sourcedir':", sourcedir)

dbutils.widgets.text("targetdir", "","")
targetdir = dbutils.widgets.get("targetdir")
print ("Param -'targetdir':", targetdir)

dbutils.widgets.text("myfile", "","")
myfile = dbutils.widgets.get("myfile")
print ("Param -'myfile':", myfile)

##dbutils.fs.cp("abfss://fs1@.dfs.core.windows.net/movies/","gs:///",True)
dbutils.fs.cp(sourcedir,targetdir,True)
##df = spark.read.csv("gs:///moviesDB.csv",header=True)
df = spark.read.csv(myfile,header=True)

 

 

 

 

 

Conclusion

Using this approach, we can move data between different storage providers if they provide a compatible jar. In fact, this will work across S3, GCS, BigQuery and many more.

Further we discussed how to automate the process to tie in with broader Azure data orchestration. This approach augments multi-cloud, on-prem data integration capabilities available out of the box with Azure Data Factory.

I should call out that this approach does not support mounts, however that is a minor limitation.

 

Azure Data Explorer at a glance (video)

Azure Data Explorer at a glance (video)

This article is contributed. See the original author and article here.

please help us share the new animation showing Azure Data Explorer at a glance. 

We had planned to have a real life movie with our customers, but COVID-19 decided otherwise. We made some adjustments, and here are the results! Please enjoy and please share.

 

In 2.5 minutes you’ll get a full explanation of What Azure Data Explorer is and when to use it.

YouTube 

LinkedIn

Azure Data Explorer Docs

Free online Courses: 

    1. How to Start with Microsoft Azure Data Explorer
    2. Exploring Data in Microsoft Azure Using Kusto Query Language and Azure Data Explorer
    3. Microsoft Azure Data Explorer – Advanced KQL

Enjoy.

 

Video1.JPGVideo2.JPG

Zero to Hero with App Service, Part 5: Add and Secure a Custom Domain on Your Azure App Service Web

Zero to Hero with App Service, Part 5: Add and Secure a Custom Domain on Your Azure App Service Web

This article is contributed. See the original author and article here.

This article is the 5th part of the Zero to Hero with App Service series. This article assumes you have completed the first article

 

If you would like to customize your web app and have a domain name other than “azurewebsites.net”, you can add a custom domain to your web app. Moreover, you can secure your custom domain with a free certificate with App Service Managed Certificates, which will give your customers peace of mind when browsing your website. 

 

Prerequisite

 

Before you can add a custom domain to your web app, you need to have purchased a custom domain already. If you don’t have a custom domain, you can buy one through App Service Domains, which you can get started with the App Service Domains section of the article . If you already have your own custom domain, proceed to the adding a custom domain to your web app section of the article.

 

App Service Domains

 

App Service Domains lets you create and manage domains hosted on Azure DNS through the Azure portal. The domain can be used for services such as Web Apps, Traffic Manager, and etc.. Purchasing an App Service Domain also provides the added benefit of privacy protection: your personal data will be protected from the WHOIS public database for free. This is often costs extra with other domain registrars. This product can auto-renew your domains and it integrates easily with your web apps.

 

To create your App Service Domain, you can click on this link here or you can head to the Azure portal and search for “App Service Domain”.

 

Create-ASD.PNG

 

In the domain search bar, type the domain name you would like to purchase. If you don’t see the name in the list of available domains, then the domain isn’t available for purchase. However, you can choose from the suggested list of available domains or enter a new domain you would like to purchase. In the “Contact information” tab, enter your personal information. Then in the “Advanced” tab, choose whether you want to set up auto-renew for the domain. Domain auto-renew prevents accidental loss of domain ownership after expiration. Lastly, decide whether you would like to add privacy protection at no extra charge. Go to “Review + create” to review the legal terms, verify the domain information, and click “Create”. Once your domain has successfully been created, proceed to the adding a custom domain to your web app section of the article.

 

Adding a custom domain to your web app

 

To add a custom domain to your web app, you will need to update your domain’s DNS records. If you purchased an App Service Domain, the DNS records will be updated for you automatically and you can proceed to verifying and adding a custom domain section. Otherwise, you will need to work on updating DNS records .

 

Updating DNS records

 

You will need to get the custom domain verification ID of your web app. This token will be used to verify the domain ownership. You can get this value in the “Custom domains” tab of your web app.

 

Get-CDVID - Copy.png

 

Once you have the ID, go to the domain provider of your domain. In the DNS records, create a CNAME and a TXT Record. As an example, if you want to map your ‘www’ subdomain, refer to the chart below:

 

Record Type Host Value
CNAME www .azurewebsites.net
TXT asuid.www Custom Domain Verification ID

 

Your DNS records page should look something like the following example:

 

dns-records.png

Verifying and adding custom domain

 

After updating your DNS records (if not using App Service Domain):

  1. Go to your App Service and navigate to the “Custom domain” section under “Settings”.
  2. Click on the “Add custom domain” button
  3. Enter the domain that you would like to use
  4. Click “Validate”
  5. If you correctly updated your DNS records and the DNS changes have propagated, you will see the option to “add custom domain”. Otherwise, return to the previous updating DNS records section to make sure you have correctly updated your DNS records. Click “add custom domain”.

 

Add-Custom-Domain.png

 

Once the custom domain has successfully been added to your web app, you will see it under the list of “Assigned Custom Domains”. You can navigate to your web app using these domain names.

 

If you are interested in securing your custom domain, proceed to the following section on Creating an App Service Managed Certificate .

 

Creating an App Service Managed Certificate

 

If you would like to secure your custom domain at no cost, you can create an App Service Managed Certificate and bind it to your domain. With Managed Certificates, you don’t have to worry about renewals, as the certificate is automatically renewed for you!

 

Go to your web app resource and navigate to the “TLS/SSL settings” section under “Settings”. Click on the “Private Key Certificates” blade and look for the “Create App Service Managed Certificate” button.

 

Cert-Blade.png

Select the domain from the dropdown menu that you would like to create a certificate for and click “Create”.

 

Create-Free-Cert.png

 

Once the certificate has been created, you will see that it in the list of your private certificates on the “TLS/SSL Settings” blade. In order to use this certificate to secure your domain, you will need to bind this certificate to your domain, which will be explained in the next section of binding your certificate to your web app .

 

Free-Cert-Created.png

 

 

Binding Your Certificate to Your Web App

 

The final step to securing your domain is to bind your certificate to the domain. In the Portal, go to your web app and navigate to the “Custom domain” section under “Settings”. Look for the domain you want to secure from the list of “Assigned Custom Domains” and click “Add binding”.

 

Binding-Option.png

 

In the following blade…

  1. Select the correct custom domain
  2. Select the App Service Managed Certificate you’ve just created from the dropdown menu
  3. Select “SNI SLL” for the TLS/SSL Type
  4. Click “Add Binding”

 

Add-Binding.png

Once the binding has successfully been created, you will see a green checkmark and the word “Secure” beside your custom domain under the “Assigned Custom Domains” list.

 

Summary

 

Congratulations! In this article, you have successfully added and secured a custom domain for your App Service! Your users can now reach your web site at the new domain, and their browser will let them know that the site is secured.

 

Helpful Resources

 

New Azure Migration Resources

This article is contributed. See the original author and article here.

We are excited to share several new Azure Migration-related assets to help you navigate cloud migration, learn about best practices, and understand tooling options available from Microsoft.

 

  1. Cloud Migration Simplified E-Book Learn about the cloud migration journey, with our tried and true framework for a successful migration and migration best practices, along with guidance on how to navigate all of the tools and resources provided by Microsoft and partners. 
  2. Azure Migrate E-Book Learn how to discover, assess, and migrate infrastructure, applications, and data to the cloud with Microsoft’s first-party migration service, Azure Migrate.
  3. Virtual Machine Migration Learning Path Get step-by-step instructions along with video demos on how to migrate virtual machines and apps to Azure using Azure Migrate, including guided set up, assessment, and migration. 
Azure HDInsight HBase cluster performance comparison using YCSB

Azure HDInsight HBase cluster performance comparison using YCSB

This article is contributed. See the original author and article here.

Prerequisites

Before you start the Lab please ensure you have the below

  • Azure subscription with authorization to create an HDInsight HBase cluster.
  • Access to an SSH client like Putty(Windows) /Terminal(Macbook)

 

Provision HDInsight HBase cluster with Azure Management Portal

To provision HDInsight HBase with the new experience on Azure Management Portal, perform the below steps.

  1. Go to the Azure Portal portal.azure.com. Login using your azure account credentials.

image001.png

  • We would start with creating a Premium Block Blob Storage Account. From the New Page , click on Storage Account.

image002.gif

In the Create Storage Account page populate the below fields.

  • Subscription: Should be autopopulated with the subscription details

  • Resource Group: Enter a resource group for holding your HDInsight HBase deployment

  • Storage account name: Enter a name for your storage account for use in the premium cluster.

  • Region: Enter the name of the region of deployment(ensure that cluster and storage account are in the same region)

  • Performance : Premium

  • Account kind : BlockBlobStorage

  • Replication : Locally-redundant storage(LRS)

  • Cluster login username:Enter username for cluster administrator(default:admin)

image0015.png

 

  • Leave all other tabs at default and click on Review+create to create the storage account.
  • After the storage account is created click on Access Keys on the left and copy key1 . We would use this later in the cluster creation process.

image022.png

  • Lets now start deploying an HDInsight HBase cluster with Accelerated writes. Select Create a resource -> Analytics -> HDInsight

image002.png

On the Basics Tab populate the below fields towards the creation of an HBase cluster.

  • Subscription: Should be autopopulated with the subscription details

  • Resource Group: Enter a resource group for holding your HDInsight HBase deployment

  • Cluster Name: Enter the cluster name. A green tick will appear if the cluster name is available.

  • Region: Enter the name of the region of deployment

  • Cluster Type : Cluster Type – HBase Version- HBase 2.0.0(HDI 4.0)

  • Cluster login username:Enter username for cluster administrator(default:admin)

  • Cluster login password:Enter password for cluster login(default:sshuser)

  • Confirm Cluster login password: Confirm the password entered in the last step

  • Secure Shell(SSH) username: Enter the SSH login user (default:sshuser)

  • Use cluster login password for SSH: Check the box to use the same password for both SSH logins and Ambari Logins etc.

image003.png

Click Next : Storage to launch the Storage Tab and populate the below fields

  • Primary Storage Type: Azure Storage.
  • Selection Method: Choose Radio button Use access key
  • Storage account name: Enter the name of the Premium Block Blob storage account created earlier
  • Access Key:Enter the key1 access key you copied earlier
  • Container: HDInsight should propose a default container name. You could either choose this or create a name of your own.

image004.gif

  • Leave the rest of the options untouched and scroll down to check the checkbox Enable HBase accelerated writes(Note that we would later be creating a second cluster without accelerated writes using the same steps but with this box unchecked.)

image010.gif

 

  • Leave the Security+Networking blade to its default settings with no changes and go to the Configuration+pricing tab.

  • In the Configuration+pricing tab, note the Node configuration section now has a line Item titled Premium disks per worker node.

  • Choose the Region node to 10 and Node Size to DS14v2(you could chooser smaller number and size also but ensure that the both clusters have identical number of nodes and VM SKU to ensure parity in comparison)

image007.png

  • Click Next: Review + Create

  • In the Review and Create tab , ensure that HBase Accelerated Writes is Enabled under the Storage section.`

image008.png

Repeat the same steps again to create a second HDInsight HBase cluster , this time without Accelerated writes. Note the below changes

    • Use a normal blob storage account that is recommended by default
    • Keep the Enable Accelerated Writes checkbox unchecked on the Storage tab.

image010.png

  • In the Configuration+pricing tab for this cluster , note that the Node configuration section does NOT have a Premium disks per worker node line item.
  • Choose the Region node to 10 and Node Size to D14v2.(Also note the lack of DS series VM types like earlier)

image009.png

  • Click Create to start deploying the second cluster without Accelerated Writes.
  • Now that we are done with cluster deployments , in the next section we would set up and run YCSB tests on both these clusters.

 

Setup and run YCSB tests on the clusters

 Login to HDInsight shell

  • Steps to set up and run YCSB tests on both clusters are identical.

  • On the cluster page on the Azure portal , navigate to the SSH + Cluster login and use the Hostname and SSH path to ssh into the cluster. The path should have below format.

    ssh <sshuser>@<clustername>.azurehdinsight.net 

image012.png

Create the Table

  • Run the below steps to create the HBase tables which will be used to load the datasets

  • Launch the HBase Shell and set a parameter for the number of table splits. Set the table splits (10 * Number of Region Servers)

  • Create the HBase table which would be used to run the tests

  • Exit the HBase shell

hbase(main):018:0> n_splits = 100
hbase(main):019:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
hbase(main):020:0> exit

Download the YSCB Repo

  • Download the YCSB repository from the below destination
$ curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz 
  • Unzip the folder to access the contents
$ tar xfvz ycsb-0.17.0.tar.gz 
  • This would create a ycsb-0.17.0 folder. Move into this folder

Run a write heavy workload in both clusters

  • Use the below command to initiate a write heavy workload with the below parameters
    • workloads/workloada : Indicates that the append workload workloada needs to be run
    • table: Populate the name of your HBase table created earlier
    • columnfamily: Populate the value of the HBase columfamily name from the table you created
    • recordcount : Number of records to be inserted( we use 1 Million)
    • threadcount: Number of threads( this can be varied, but needs to be kept constant across experiments)
    • -cp /etc/hbase/conf: Pointer to HBase config settings
    • -s | tee -a : Provide a file name to write your output.
 bin/ycsb load hbase12 -P workloads/workloada -p table=usertable -p columnfamily=cf -p recordcount=1000000 -p threadcount=4 -cp /etc/hbase/conf -s | tee -a workloada.dat
  • Run the write heavy workload to load 1 million rows into previously created HBase table.

Ignore the warnings that you may see after submitting the command.

HDInsight HBase with accelerated writes

$ bin/ycsb load hbase12 -P workloads/workloada -p table=usertable -p columnfamily=cf -p recordcount=1000000 -p threadcount=4 -cp /etc/hbase/conf -s | tee -a workloada.dat

2020-01-10 16:21:40:213 10 sec: 15451 operations; 1545.1 current ops/sec; est completion in 10 minutes [INSERT: Count=15452, Max=120319, Min=1249, Avg=2312.21, 90=2625, 99=7915, 99.9=19551, 99.99=113855]
2020-01-10 16:21:50:213 20 sec: 34012 operations; 1856.1 current ops/sec; est completion in 9 minutes [INSERT: Count=18560, Max=305663, Min=1230, Avg=2146.57, 90=2341, 99=5975, 99.9=11151, 99.99=296703]
....
2020-01-10 16:30:10:213 520 sec: 972048 operations; 1866.7 current ops/sec; est completion in 15 seconds [INSERT: Count=18667, Max=91199, Min=1209, Avg=2140.52, 90=2469, 99=7091, 99.9=22591, 99.99=66239]
2020-01-10 16:30:20:214 530 sec: 988005 operations; 1595.7 current ops/sec; est completion in 7 second [INSERT: Count=15957, Max=38847, Min=1257, Avg=2502.91, 90=3707, 99=8303, 99.9=21711, 99.99=38015]
...
...
2020-01-11 00:22:06:192 564 sec: 1000000 operations; 1792.97 current ops/sec; [CLEANUP: Count=8, Max=80447, Min=5, Avg=10105.12, 90=268, 99=80447, 99.9=80447, 99.99=80447] [INSERT: Count=8512, Max=16639, Min=1200, Avg=2042.62, 90=2323, 99=6743, 99.9=11487, 99.99=16495]
[OVERALL], RunTime(ms), 564748
[OVERALL], Throughput(ops/sec), 1770.7012685303887
[TOTAL_GCS_PS_Scavenge], Count, 871
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 3116
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.5517505152740692
[TOTAL_GCS_PS_MarkSweep], Count, 0
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0
[TOTAL_GCs], Count, 871
[TOTAL_GC_TIME], Time(ms), 3116
[TOTAL_GC_TIME_%], Time(%), 0.5517505152740692
[CLEANUP], Operations, 8
[CLEANUP], AverageLatency(us), 10105.125
[CLEANUP], MinLatency(us), 5
[CLEANUP], MaxLatency(us), 80447
[CLEANUP], 95thPercentileLatency(us), 80447
[CLEANUP], 99thPercentileLatency(us), 80447
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 2248.752362
[INSERT], MinLatency(us), 1120
[INSERT], MaxLatency(us), 498687
[INSERT], 95thPercentileLatency(us), 3623
[INSERT], 99thPercentileLatency(us), 7375
[INSERT], Return=OK, 1000000

Explore the outcome of the test – Accelerated writes and Regular

  • The test took 538663(8.97 Minutes) milliseconds to run
  • Return=OK, 1000000 indicates that all 1 Million inputs were were successfully written, **
  • Write throughput was at 1856 operations per second
  • 95% of the inserts had a latency of 3389 milliseconds
  • Few inserts took more time , perhaps they were blocked by region severs due to the high workload

HDInsight HBase without accelerated writes

2020-01-10 23:58:20:475 2574 sec: 1000000 operations; 333.72 current ops/sec; [CLEANUP: Count=8, Max=79679, Min=4, Avg=9996.38, 90=239, 99=79679, 99.9  =79679, 99.99=79679] [INSERT: Count=1426, Max=39839, Min=6136, Avg=9289.47, 90=13071, 99=27535, 99.9=38655, 99.99=39839]
[OVERALL], RunTime(ms), 2574273
[OVERALL], Throughput(ops/sec), 388.45918828344935
[TOTAL_GCS_PS_Scavenge], Count, 908
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 3208
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.12461770760133055
[TOTAL_GCS_PS_MarkSweep], Count, 0
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0
[TOTAL_GCs], Count, 908
[TOTAL_GC_TIME], Time(ms), 3208
[TOTAL_GC_TIME_%], Time(%), 0.12461770760133055
[CLEANUP], Operations, 8
[CLEANUP], AverageLatency(us), 9996.375
[CLEANUP], MinLatency(us), 4
[CLEANUP], MaxLatency(us), 79679
[CLEANUP], 95thPercentileLatency(us), 79679
[CLEANUP], 99thPercentileLatency(us), 79679
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 10285.497832
[INSERT], MinLatency(us), 5568
[INSERT], MaxLatency(us), 1307647
[INSERT], 95thPercentileLatency(us), 18751
[INSERT], 99thPercentileLatency(us), 33759
[INSERT], Return=OK, 1000000

Comparison of the HBase Write numbers with explanation

Parameter Unit With Accelerated writes Without Accelerated writes
[OVERALL], RunTime(ms) Milliseconds 567478 2574273
[OVERALL], Throughput(ops/sec) Operations/sec 1770 388
[INSERT], Operations # of Operations 1000000 1000000
[INSERT], 95thPercentileLatency(us) Microseconds 3623 18751
[INSERT], 99thPercentileLatency(us) Microseconds 7375 33759
[INSERT], Return=OK # of records 1000000 1000000
  • [OVERALL], RunTime(ms) : Total execution time in milliseconds
  • [OVERALL], Throughput(ops/sec) : Number of operations/sec across all threads
  • [INSERT], Operations: Total number of insert operations,with associated average, min, max, 95th and 99th percentile latencies below
  • [INSERT], 95thPercentileLatency(us): 95% of INSERT operations have a data point below this value
  • [INSERT], 99thPercentileLatency(us): 99% of INSERT operations have a data point below this value
  • [INSERT], Return=OK: Record OK indicates that all INSERT operations were succesfull with the count alongside

Try out a few other workloads and compare

Read Mostly(95% Read & 5% Write) : workloadb

bin/ycsb run hbase12 -P workloads/workloadb -p table=usertable -p columnfamily=cf -p recordcount=1000000 -p operationcount=100000 -p threadcount=4 -cp /etc/hbase/conf -s | tee -a workloadb.dat
Parameter Unit With Accelerated writes Without Accelerated writes
[OVERALL], RunTime(ms) Milliseconds 292029 374379
[OVERALL], Throughput(ops/sec) Operations/sec 3424 2537
[READ], Operations Operations/sec 949833 949586
[UPDATE], Operations Operations/sec 50167 50414
[READ], 95thPercentileLatency(us) Microseconds 1401 3395
[READ], 99thPercentileLatency(us) Microseconds 1387 3611
[READ], Return=OK # of records 949833 949586


Read Only
 : workloadc

bin/ycsb run hbase12 -P workloads/workloadc -p table=usertable -p columnfamily=cf -p recordcount=1000000 -p operationcount=100000 -p threadcount=4 -cp /etc/hbase/conf -s | tee -a workloadc.dat
Parameter Unit With Accelerated writes Without Accelerated writes
[OVERALL], RunTime(ms) Milliseconds 272031 253256
[OVERALL], Throughput(ops/sec) Operations/sec 3676 3948
[READ], Operations Operations/sec 1000000 1000000
[READ], 95thPercentileLatency(us) Microseconds 1385 1410
[READ], 99thPercentileLatency(us) Microseconds 3215 3723
[READ], Return=OK # of records 1000000 1000000