Search This Blog

Search This Blog

Search This Blog

Monday, May 4, 2015

We know rollup component in Abinitio is used to summarize group of data record then why do we use aggregation?

- Aggregation and Rollup, both are used to summarize the data.

- Rollup is much better and convenient to use.

- Rollup can perform some additional functionality, like input filtering and output filtering of records.

- Aggregate does not display the intermediate results in main memory, where as Rollup can.

- Analyzing a particular summarization is much simpler compared to Aggregations.


What kind of layouts does Abinitio support?

- Abinitio supports serial and parallel layouts.

- A graph layout supports both serial and parallel layouts at a time.

- The parallel layout depends on the degree of the data parallelism

- A multi-file system is a 4-way parallel system

- A component in a graph system can run 4-way parallel system.


How do you add default rules in transformer?

The following is the process to add default rules in transformer

- Double click on the transform parameter in the parameter tab page in component properties

- Click on Edit menu in Transform editor

- Select Add Default Rules from the dropdown list box.

- It shows Match Names and Wildcard options. Select either of them.

What is a look-up?

- A lookup file represents a set of serial files / flat files

- A lookup is a specific data set that is keyed.

- The key is used for mapping values based on the data available in a particular file

- The data set can be static or dynamic.

- Hash-joins can be replaced by reformatting and any of the input in lookup to join should contain less number of records with a slim length of records

- Abinitio has certain functions for retrieval of values using the key for the lookup

What is a ramp limit?

- A limit is an integer parameter which represents a number of reject events

- Ramp parameter contain a real number representing a rate of reject events of certain processed records

- The formula is - No. of bad records allowed = limit + no. of records x ramp

- A ramp is a percentage value from 0 to 1.

- These two provides the threshold value of bad records.

What is a Rollup component? Explain about it.

- Rollup component allows the users to group the records on certain field values.

- It is a multi stage function and contains

- Initialize 2. Rollup 3. Finalize functions which are mandatory

- To counts of a particular group Rollup needs a temporary variable

- The initialize function is invoked first for each group

- Rollup is called for each of the records in the group.

- The finally function calls only once at the end of last rollup call.

How to add default rules in transformer?

- Open Add Default Rules dialog box.

- Select Match Names – to match the names that generates a set of rules to copy input fields to out fields with same name

- Use Wildcard(. *) Rule : This rule generates only one rule to copy input fields to output fields with the same name

- If not displayed – display the Transform Editor Grid

- Click the Business Rule tab . Select Edit?Add Default Rules

- Nothing is needed to write in the reformat .xfr file in case of reformat, if there is no need to use any real transform other than reducing the set of fields.

What is the difference between partitioning with key / hash and round robin?

Partitioning by Key / Hash Partition :

- The partitioning technique that is used when the keys are diverse

- Large data skew can exist when the key is present in large volume

- It is apt for parallel data processing

Round Robin Partition :

- This partition technique uniformly distributes the data on every destination data partitions

- When number of records is divisible by number of partitions, then the skew is zero.

- For example – a pack of 52 cards is distributed among 4 players in a round-robin fashion.
Explain the methods to improve performance of a graph?

The following are the ways to improve the performance of a graph :

- Make sure that a limited number of components are used in a particular phase

- Implement the usage of optimum value of max core values for the purpose of sorting and joining components.

- Utilize the minimum number of sort components

- Utilize the minimum number of sorted join components and replace them by in-memory join / hash join, if needed and possible

- Restrict only the needed fields in sort, reformat, join components

- Utilize phasing or flow buffers when merged or sorted joins

- Use sorted join, when two inputs are huge, otherwise use hash join

What is the function that transfers a string into a decimal?

- Use decimal cast with the size in the transform() function, when the size of the string and decimal is same.

- Ex: If the source field is defined as string(8).

- The destination is defined as decimal(8)

- Let us assume the field name is salary.

- The function is out.field :: (decimal(8)) in salary

- If the size of the destination field is lesser that the input then string_substring() function can be used

- Ex : Say the destination field is decimal(5) then use…

- out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5))

- The ‘ lrtrim ‘ function is used to remove leading and trailing spaces in the string
Describe the Evaluation of Parameters order.

Following is the order of evaluation:

- Host setup script will be executed first

- All Common parameters, that is, included , are evaluated

- All Sandbox parameters are evaluated

- The project script – project-start.ksh is executed

- All form parameters are evaluated

- Graph parameters are evaluated

- The Start Script of graph is executed

Explain PDL with an example?

- To make a graph behave dynamically, PDL is used

- Suppose there is a need to have a dynamic field that is to be added to a predefined DML while executing the graph

- Then a graph level parameter can be defined

- Utilize this parameter while embedding the DML in output port.

- For Example : define a parameter named myfield with a value “string(“ | “”) name;”

- Use ${mystring} at the time of embedding the dml in out port.

- Use $substitution as an interpretation option

State the working process of decimal_strip function.

- A decimal strip takes the decimal values out of the data.

- It trims any leading zeros

- The result is a valid decimal number

Ex:
decimal_strip("-0184o") := "-184"
decimal_strip("oxyas97abc") := "97"
decimal_strip("+$78ab=-*&^*&%cdw") := "78"
decimal_strip("Honda") "0"

State the first_defined function with an example.

- This function is similar to the function NVL() in Oracle database

- It performs the first values which are not null among other values available in the function and assigns to the variable

Example: A set of variables, say v1,v2,v3,v4,v5,v6 are assigned with NULL.
Another variable num is assigned with value 340 (num=340)
num = first_defined(NULL, v1,v2,v3,v4,v5,v6,NUM)
The result of num is 340

What is MAX CORE of a component?

- MAX CORE is the space consumed by a component that is used for calculations

- Each component has different MAX COREs

- Component performances will be influenced by the MAX CORE’s contribution

- The process may slow down / fasten if a wrong MAX CORE is set

What are the operations that support avoiding duplicate record?

Duplicate records can be avoided by using the following:

- Using Dedup sort

- Performing aggregation

- Utilizing the Rollup component


What parallelisms does Abinitio support?

AbInitio supports 3 parallelisms. They are

- Data Parallelism : Same data is parallelly worked in a single application

- Component Parallelism : Different data is worked parallelly in a single application

- Pipeline Parallelism : Data is passed from one component to another component. Data is worked on both of the components.

State the relation between EME, GDE and Co-operating system.

EME:

- EME stands for Enterprise Metadata Environment

- It is a repository to AbInitio. It holds transformations, database configuration files, metadata and target information

GDE:

- GDE – Graphical Development Environment

- It is an end user environment. Graphs are developed in this environment

- It provides GUI for editing and executing AbInitio programs

Co-operative System:

- Co-operative system is the server of AbInitio.

- It is installed on a specific OS platform known as Native OS.

- All generated graphs in GDE are later deployed and executed in co-operative system



What is a deadlock and how it occurs?

- A graphical / program hand is known as deadlock.

- The progression of a program would be stopped when a dead lock occurs.

- Data flow pattern likely causes a deadlock

- If a graph flows diverge and converge in a single phase, it is potential for a deadlock

- A component might wait for the records to arrive on one flow during the flow converge, even though the unread data accumulates on others.

- In GDE version 1.8, the occurrence of a dead lock is very rare

What is the difference between check point and phase?

Check point:

- When a graph fails in the middle of the process, a recovery point is created, known as Check point

- The rest of the process will be continued after the check point

- Data from the check point is fetched and continue to execute after correction.

Phase:

- If a graph is created with phases, each phase is assigned to some part of memory one after another.

- All the phases will run one by one

- The intermediate file will be deleted




2 comments: