Friday, August 31, 2018

BULK COLLECT & FORALL vs. CURSOR & FOR-LOOP



CREATE OR REPLACE PROCEDURE CURSOR_FOR_OPEN_ORA
 IS
 l_ename VARCHAR2(30);
 l_empno VARCHAR2(30);
 l_sal VARCHAR2(19);
 CURSOR cur IS SELECT ename, empno , sal  FROM emp;
 BEGIN
 dbms_output.put_line('Before CURSOR OPEN: ' || systimestamp);
 OPEN cur;
 dbms_output.put_line('Before LOOP: ' || systimestamp);
 LOOP
 FETCH cur INTO l_ename, l_empno, l_sal;
 IF cur%NOTFOUND THEN
 EXIT;
 END IF;
 INSERT INTO emp1(ename,empno,sal) values (l_ename, l_empno, l_sal);
 END LOOP;
 CLOSE cur;
 dbms_output.put_line('After CURSOR CLOSE: ' || systimestamp);
 COMMIT;
 END;
/

 Procedure created.

SQL> exec CURSOR_FOR_OPEN_ORA();
Before CURSOR OPEN: 31-AUG-18 03.19.22.033000000 PM +05:30
Before LOOP: 31-AUG-18 03.19.22.033000000 PM +05:30
After CURSOR CLOSE: 31-AUG-18 03.19.22.033000000 PM +05:30

PL/SQL procedure successfully completed.

Thursday, August 30, 2018

Informatica Versions History

  • Informatica Powercenter 4.1
  • Informatica Powercenter 5.1
  • Powercenter Informatica 6.1.2
  • Informatica Powercenter 7.1.2
  • Informatica Powercenter 8.1
  • Informatica Powercenter 8.5
  • Informatica Powercenter 8.6
  • Informatica Powercenter 9.1
  • Informatica Powercenter 10

Tuesday, August 28, 2018

How to find top three highest salary in emp table in oracle?



select empno,salary from emp e
     where 3 > ( Select count(salary) from emp
      where e.salary < salary )


Another way :

    select * from
    (
    select empno,salary,
    Rank() over(order by salary desc) as rank from emp )
    where Rank <= 3;

Another Way :

    select * from
    (
    select empno,salary from emp
    order by salary desc
    )
    where rownum <= 3;


Saturday, August 25, 2018

Informatica power center

Informatica power center 
constitutes of three main components:

Client tools: installed on developer machines.
Power Centre repository: place to store metadata for an application
Power center server: server to perform data executions
With growing customer base, Informatica is continuously trying to leverage its data integration solutions. This tool has in built powerful mapping templates to help manage data in an efficient manner.



There are mainly 4 steps in the Informatica ETL process,


  1. Extract or Capture
  2. Scrub or Clean
  3. Transform
  4. Load and Index

Features of Informatica ETL: 



For all the Data integration and ETL operations, Informatica has provided us with Informatica PowerCenter. Let us now see some key features of Informatica ETL:
  • Provides facility to specify a large number of transformation rules with a GUI.
  • Generate programs to transform data.
  • Handle multiple data sources.
  • Supports data extraction, cleansing, aggregation, reorganisation, transformation, and load operations.
  • Automatically generates programs for data extraction.
  • High-speed loading of target data warehouses.

Data warehouse design methods



In addition to Inmon's top-down approach to data warehouses and Kimball's bottom-up method, some organizations have also adopted hybrid options.
  • Top-down approach: Inmon's method calls for building the data warehouse first. Data is extracted from operational and possibly third-party external systems and may be validated in a staging area before being integrated into a normalized data model. Data marts are created from the data stored in the data warehouse.


  • Bottom-up method: Kimball's data warehousing architecture calls for dimensional data marts to be created first. Data is extracted from operational systems, moved to a staging area and modeled into a star schema design, with one or more fact tables connected to one or more dimensional tables. The data is then processed and loaded into data marts, each of which focuses on a specific business process. Data marts are integrated using a data warehouse bus architecture to form an enterprise data warehouse.



  • Hybrid method: Hybrid approaches to data warehouse design include aspects from both the top-down and bottom-up methods. Organizations often seek to combine the speed of the bottom-up approach with the integration achieved in a top-down design.

Data warehouses vs. databases vs. data lakes

Databases and data lakes are often confused with data warehouses, but there are important differences.
While data warehouses typically store data from multiple sources and utilize predefined schemas designed for data analytics, a database is generally used to capture and store data from a single source, such as a transactional system, and its schema is normalized. Databases aren't designed to run across very large data sets.

Sunday, August 19, 2018

Types of Data Warehouse

Information processing, analytical processing, and data mining are the three types of data warehouse applications that are discussed below:


  1.  Information Processing – A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs. 
  2.  Analytical Processing – A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting. 
  3.  Data Mining - Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using visualization tools.  

OLTP vs OLAP: What's the Difference?


What is OLTP?

Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization.

The primary objective is data processing and not data analysis

examples of OLTP system are:



Online banking
Online airline ticket booking
Sending a text message
Order entry
Benefits of OLTP method

It administers daily transactions of an organization.
OLTP widens the customer base of an organization by simplifying individual processes.


What is OLAP?



Online Analytical Processing, a category of software tools which provide analysis of data for business decisions. OLAP systems allow users to analyze database information from multiple database systems at one time.

The primary objective is data analysis and not data processing.

Example of OLAP

Any Datawarehouse system is an OLAP system. Uses of OLAP are as follows

A company might compare their mobile phone sales in September with sales in October, then compare those results with the with another location which may be stored in a sperate database.
Amazon analyzes purchases by its customers to come up with a personalized homepage with products which likely interest to their customer.
Benefits of using OLAP services
OLAP creates a single platform for all type of business analytical needs which includes planning, budgeting, forecasting, and analysis.
The main benefit of OLAP is the consistency of information and calculations.
Easily apply security restrictions on users and objects to comply with regulations and protect sensitive data.



Data Warehouse Concepts, Architecture and Components


1.

What is Data warehouse?

Data warehouse is an information system that contains historical and commutative data from single or multiple sources. It simplifies reporting and analysis process of the organization.

It is also a single version of truth for any company for decision making and forecasting.

In this tutorial, you will learn-

What is Data warehouse?
Characteristics of Data warehouse
·         Subject-Oriented

·         Integrated

·         Time-Variant

·         Non-volatile

Data Warehouse Architectures
Datawarehouse Components
·         Data Warehouse Database

·         Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)

·         Metadata

·         Query Tools

·         Data warehouse Bus Architecture

Data Marts
Data warehouse Architecture Best Practices

Friday, August 17, 2018

List of ETL tools

List of ETL tools - 

  1. Oracle Warehouse Builder (OWB)
  2. SAP Data Services
  3. IBM Infosphere Information Server
  4. SAS Data Management
  5. PowerCenter Informatica
  6. Elixir Repertoire for Data ETL
  7. Data Migrator (IBI)
  8. SQL Server Integration Services (SSIS)
  9. Talend Studio for Data Integration
  10. Sagent Data Flow
  11. Actian DataConnect
  12. Open Text Integration Center
  13. Oracle Data Integrator (ODI)
  14. Cognos Data Manager
  15. CloverETL
  16. Centerprise Data Integrator
  17. IBM Infosphere Warehouse Edition
  18. Pentaho Data Integration
  19. Adeptia Integration Server
  20. Syncsort DMX
  21. QlikView Expressor
  22. Relational Junction ETL Manager (Sesame Software)

Monday, August 13, 2018

The Python topics Go From Beginner To Advanced!

What Will I Learn?
  • Understand the basic as well as core concepts of Python programming language
  • Acquire enhanced skills with Python graphics

  • Design, create, develop, and code Python games

  • Read and write external data files using Python
  • Implement functions and call built-in Python functions
  • Debug an entire program and handle errors in Python
  • Implement basic Python structures such as if statements, loops, and functions

Learn Data Science

learn Data Science in R from Scratch

Table of Contents

  1. Basics of R Programming for Data Science
    • Why learn R ?
    • How to install R / R Studio ?
    • How to install R packages ?
    • Basic computations in R
  2. Essentials of R Programming
    • Data Types and Objects in R
    • Control Structures (Functions) in R
    • Useful R Packages
  3. Exploratory Data Analysis in R
    • Basic Graphs
    • Treating Missing values
    • Working with Continuous and Categorical Variables
  4. Data Manipulation in R
    • Feature Engineering
    • Label Encoding / One Hot Encoding
  5. Predictive Modeling using Machine Learning in R
    • Linear Regression
    • Decision Tree
    • Random Forest

Let’s get started !
Note: The data set used in this article is from Big Mart Sales Prediction.

1. Basics of R Programming

Why learn R ?

I don’t know if I have a solid reason to convince you, but let me share what got me started. I have no prior coding experience. Actually, I never had computer science in my subjects. I came to know that to learn data science, one must learn either R or Python as a starter. I chose the former. Here are some benefits I found after using R:
  1. The style of coding is quite easy.
  2. It’s open source. No need to pay any subscription charges.
  3. Availability of instant access to over 7800 packages customized for various computation tasks.
  4. The community support is overwhelming. There are numerous forums to help you out.
  5. Get high performance computing experience ( require packages)
  6. One of highly sought skill by analytics and data science companies.
There are many more benefits. But, these are the ones which have kept me going. If you think they are exciting, stick around and move to next section. And, if you aren’t convinced, you may like Complete Python Tutorial from Scratch.


More :https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/