Amazon Redshift is a quick, scalable, safe, and absolutely managed cloud information warehouse that permits you to analyze your information at scale. Amazon Redshift Serverless allows you to entry and analyze information with out the same old configurations of a provisioned information warehouse. Assets are robotically provisioned and information warehouse capability is intelligently scaled to ship quick efficiency for even essentially the most demanding and unpredictable workloads. Should you favor to handle your Amazon Redshift assets manually, you’ll be able to create provisioned clusters to your information querying wants. For extra info, confer with Amazon Redshift clusters.
Amazon Redshift offers efficiency metrics and information so you’ll be able to monitor the well being and efficiency of your provisioned clusters, serverless workgroups, and databases. The efficiency information you should use on the Amazon Redshift console falls into two classes:
- Amazon CloudWatch metrics – Helps you monitor the bodily elements of your cluster or serverless, akin to useful resource utilization, latency, and throughput.
- Question and cargo efficiency information – Helps you monitor database exercise, examine and diagnose question efficiency issues.
Amazon Redshift has launched a brand new function known as the Question profiler. The Question profiler is a graphical instrument that helps customers analyze the parts and efficiency of a question. This function is a part of the Amazon Redshift console and offers a visible and graphical illustration of the question’s run order, execution plan, and varied statistics. The Question profiler makes it simpler for customers to grasp and troubleshoot their queries.
On this submit, we cowl two frequent use instances for troubleshooting question efficiency. We present you step-by-step how you can analyze and troubleshoot long-running queries utilizing the Question profiler.
Overview
For Amazon Redshift Serverless, the Question profiler could be accessed by going to the Serverless console. Select Question and database monitoring, choose a question, after which navigate to the Question plan tab. If a question plan is out there, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.
For Amazon Redshift provisioned, the Question profiler could be accessed by going to the provisioned clusters dashboard. Select Question and masses, and select a question. Navigate to the Question plan tab. If a question plan is out there, you’ll observe an inventory of kid queries. Select a question to view it in Question profiler.
Conditions
- You should use the next pattern AWS Identification and Entry Administration (IAM) coverage to configure your IAM person or position with minimal privileges to entry Question profiler from the AWS console. In case your IAM person or position already has entry to Question and masses part of Redshift provisioned cluster dashboard or Question and database monitoring part of Redshift serverless dashboard, then no extra permissions are wanted:
- You possibly can select to make use of Question profiler in your account with an current Amazon Redshift information warehouse and queries. Nevertheless, if you want to implement this demo in your current Amazon Redshift information warehouse, obtain Redshift question editor v2 pocket book, Redshift Question profiler demo, and confer with the Information Loading part later on this submit.
- You need to connect with the cluster utilizing database credentials and grant the
sys:operator
orsys:monitor
position to the database person to view queries run by customers.
Information loading
Amazon Redshift Question Editor v2 comes with pattern information that may be loaded right into a pattern database and corresponding schema. To check Question profiler in opposition to the pattern information, load the tpcds pattern information and run queries.
- To load the tpcds pattern information, launch Redshift question editor v2 and increase the database
sample_data_dev.
- Select the icon related to the
tpcds.
- The question editor v2 then masses the info right into a schema tpcds within the database sample_data_dev.
The next screenshot reveals these steps.
- Confirm the info by operating the next pattern question, as proven within the following screenshot.
Use instances
On this submit, we describe two frequent makes use of instances round question efficiency and how you can use Question profiler to troubleshoot the efficiency points:
- Nested loop joins – This be part of sort is the slowest of the doable be part of sorts. Nested loop joins are the cross-joins and not using a be part of situation that outcome within the Cartesian product of two tables.
- Suboptimal information distribution – If information distribution is suboptimal, you may discover a big broadcast or redistribution of information throughout compute nodes when two massive tables are joined collectively.
Use case 1: Nested loop joins
To troubleshoot efficiency points with nest loop joins utilizing Question profiler, observe these steps:
- Import pocket book downloaded beforehand in conditions part of the weblog into Redshift question editor v2.
- Set the context of database to
sample_data_dev
in Question Editor v2, as proven within the following screenshot. - Run
cell #3
from demo pocket book to diagnose a question efficiency challenge associated to nested loop joins.
- Run
cell #5
to seize the question id from the SYS_QUERY_HISTORY system view filtering based mostly on the question label you set within the previous step. - On the Amazon Redshift console, within the navigation pane, choose Question and masses and select the cluster identify the place the question was initially executed, as proven within the following screenshot.
- This may open the brand new Question profiler. Underneath the Question historical past part, select
Connect with database
.After profitable connection to the database, you’ll observe the Standing exhibiting asLinked
and displaying the question historical past, as proven within the following screenshot. - You will discover your queries both by Question ID or Course of ID. Enter the Question ID captured within the previous step to filter the long-running question for additional evaluation and select the corresponding Question ID, as proven within the following screenshot.
- Underneath the Question plan part, select
Youngster question 1
, as proven within the following screenshot. If there are a number of baby queries, you’ll have to examine every one for efficiency points.
This may open the question plan in a tree view together with extra metrics on the aspect panel. This lets you shortly analyze the question streams, segments and steps. For extra details about streams, segments, and steps, confer with Question planning and execution workflow within the Amazon Redshift Database Developer Information. - Activate View streams and, within the Streams aspect panel, examine and determine which stream has the very best execution time. On this case, Streams ID 5 is the place the question spends nearly all of time, as proven within the following screenshot
- Within the Streams aspect panel, below ID, choose 5 to deal with Stream 5 for additional evaluation. Stream 5 reveals a step of Nestloop, as proven within the following screenshot.
- Select the Nestloop step to additional analyze. The aspect panel will change with step particulars and extra metrics concerning the nested loop be part of.
- By Step particulars – nestloop, we are able to examine the Enter rows and evaluate that with the Output rows, as proven within the following screenshot. On this case, because of the cross-joining with the
Store_returns
desk, 287,514 enter rows explodes to 950,233,770 rows, thus inflicting our question to run slower. - Repair the question by introducing a be part of situation between the
store_sales
andstore_returns
. Runcell #7
from Question editor v2 demo pocket book.The re-written question runs in simply 307 milliseconds.
Use case 2: Suboptimal information distribution
- To display suboptimal information distribution, change the distribution type of tables
web_sales
andweb_returns
to EVEN by operatingcell #10
of Question editor v2 demo pocket book.
- Run
cell #12
. The question takes 409 milliseconds to run, as proven by the elapsed time within the following screenshot of the Question editor v2. - Observe steps 3–10 from use case 1 to find the
query_id
and to open the Question profiler view for the previous question. - On the Question profiler web page for the previous question, activate View streams. Within the Streams aspect panel, examine and determine which stream has the very best execution time. On this case, Stream ID 6 is the place the question spends a majority of the time, as proven within the following screenshot.
- Underneath ID, choose 6 from the Streams aspect panel for additional evaluation.
Stream 6 reveals a step of hash be part of, which includes a hash be part of of two tables which can be each redistributed. This may be inferred from Hash Proper Be a part of DS_DIST_BOTH below Clarify plan node info within the following screenshot. Normally, these redistributions happen as a result of the tables aren’t joined on their distribution keys, or they don’t have the proper distribution type. Within the case of huge tables, these redistributions can result in vital efficiency degradation and, therefore, you will need to determine and repair such steps to optimize question efficiency.
- Repair this suboptimal information distribution sample by selecting the suitable distribution keys on the tables concerned:
web_sales
andweb_returns
. To alter the distribution types, runcell #14
of demo pocket book to change desk instructions. - After the previous instructions end operating, run
cell #16
to re-execute the choose question. As proven within the Question Editor within the following screenshot, now the identical question completed in 244 milliseconds after updating the distribution type to key for tablesweb_sales
andweb_returns
.
- Within the Question profiler view, activate View streams and spot that Streams 5 now took essentially the most time. It took 8 milliseconds to complete, as in comparison with 13 milliseconds within the previous step.
- Within the Streams aspect panel, below ID, choose 5 to drill down additional, then select the Hashjoin As the next screenshot reveals, after altering the distribution type to key for each
web_sales
andweb_return
tables, not one of the tables have to be redistributed on the question runtime, leading to optimized efficiency.
Concerns
Contemplate the next particulars whereas utilizing Question profiler:
- Question profiler shows info returned by the SYS_QUERY_HISTORY, SYS_QUERY_EXPLAIN, SYS_QUERY_DETAIL, and SYS_CHILD_QUERY_TEXT views.
- Question profiler solely shows question info for queries which have just lately run on the database. If a question completes utilizing a prepopulated resultset cache, Question profiler gained’t have details about it as a result of Amazon Redshift doesn’t generate a question plan for such queries.
- Queries run by Question profiler to return the question info run on the identical information warehouse because the user-defined queries.
Clear Up
To keep away from surprising prices, full the next motion to delete the assets you created:
Drop all of the tables within the sample_data_dev
below tpcds
schema.
Conclusion
On this submit, we mentioned how you can use Amazon Redshift Question profiler to watch and troubleshoot long-running queries. We demonstrated a step-by-step method to investigate question efficiency by inspecting the question execution plan and statistics and figuring out the foundation explanation for question slowness. Do that function in your atmosphere and share your suggestions with us.
Concerning the Authors
Raks Khare is a Senior Analytics Specialist Options Architect at AWS based mostly out of Pennsylvania. He helps prospects throughout various industries and areas architect information analytics options at scale on the AWS platform. Exterior of labor, he likes exploring new journey and meals locations and spending high quality time along with his household.
Blessing Bamiduro is a part of the Amazon Redshift Product Administration staff. She works with prospects to assist discover the usage of Amazon Redshift ML of their information warehouse. In her spare time, Blessing loves travels and adventures.
Ekta Ahuja is an Amazon Redshift Specialist Options Architect at AWS. She is obsessed with serving to prospects construct scalable and sturdy information and analytics options. Earlier than AWS, she labored in a number of completely different information engineering and analytics roles. Exterior of labor, she enjoys panorama pictures, touring, and board video games.