Mapping Application Topologies for Root Cause Analysis: An operations use-case

I know it is been a long while since I wrote a blog post, but given the neat technology I’m involved with these days, I couldn’t help but share this slick find!

Not too long ago, Courtney Llamas blogged about Log Analytics Cloud Services within Oracle Management Cloud (OMC) that’s gives a great overview of its capabilities.

A brief note on OMC; Oracle Management Cloud is a heterogeneous monitoring platform that comprises of (as of January 2017) four services that are able to communicate via it’s common underlying model to provide a holistic view of any type of environment (See here for a list of latest services). Regardless of whether the environment is on-premises, in any public cloud, or simple a hybrid of the two.

Log Analytics Services is not only able to collect logs from any source, it is actually aware of any associations between the entities of these logs; for example databases, servers, middleware servers etc. In addition, further associations can be defined to customize the topology view of an environment or application. This is truly a superb capability as it immediately lays out the components (servers, databases, or just entities) that comprise an environment or a subset of it in relative terms. Whether an operations resource monitoring a giant dashboard, a support engineer looking to find a root cause, or simply a line of business application owner checking out their applications health – there’s something for everyone.

To put it in simpler and relative terms; how many times does an operations resource wonder what all servers or services make up an “application”? Can you imagine if someone who has no idea about the actual application topology is able to easily pin-point anomalies just by looking through an application topology?

NewImage.png

Now let’s assume there are log patterns which appear within a couple of these targets that seem a bit out of place. Not only will the Oracle Management Cloud platform send you an alert (if you so chose to configure), but it will highlight the targets within this topology map where the anomalies were observed.

2017-01-04_22-04-17

All the user now has to do is click their way to the afflicted entities within this topology – which happen to be a database and its underlying ASM instance.

When clicking on this interactive topology, we’re immediately redirected to the associated targets, where applying a quick filter for the database and ASM instance shows a little over 14000 log records within the specific time period.

2017-01-04_22-08-07

Now, so far what you’ve seen is how Log Analytics is able to collect the logs and correlate them in a topology by associations and groups, but the slick part is its machine learning capabilities. Being able to determine any outliers within the logs and display them distinctly with a simple click of a button (also called clustering) which helps visualize the analyzed log data.

2017-01-04_22-08-53

That, my friends, quickly highlights the relevant outliers, their occurrences and correlations between the database and ASM instance. In this case, its clear that an offline disk in a diskgroup caused upstream errors with reading datafiles for the database instance.

At this point, all the operations resource needs to do, is to pick up the phone or assign the ticket to the right subject matter expert (SME).

Be on the lookout for additional posts about OMC and how to practically use its capabilities to solve problems within business, application, and infrastructure tiers.

Cheers!

P.S. It’s good to be back 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Musings

Things I see and learn!

Thoughts from James H. Lui

If you Care a Little More, Things Happen. Bees can be dangerous. Always wear protective clothing when approaching or dealing with bees. Do not approach or handle bees without proper instruction and training.

bdt's oracle blog

Sharing experience (by Bertrand Drouvot)

Frits Hoogland Weblog

IT Technology; Oracle, linux, TCP/IP and other stuff I find interesting

Vishal desai's Oracle Blog

Just another WordPress.com weblog

%d bloggers like this: