Data Science Basics and Introduction to Microsoft AI Platform



 what is ai ? (normally required human intelligence)

visual perception

speech recognition decision-making

translating between languages etc.


History 

In 1950 alan turing : if a machine could carry on a conversion that was that indistinguishable from a conversation with a human being , then it was a responsible to that the machine was "thinking".


Dartmouth conference 1956. 

1970 - AI winter.

1990 - MI is separate field.

1997 - IBM's Deep Blue Beats the world champion at chess.

2010 - ML becomes integral to many widely used services.


Data and Alg give you model .

business > data accquisition & Understading > modeling > deployment .

training set (usually 70-80% of data)

validate set (usually 10-20% of data)

test set (usually 10-20% of data)

training > Alg> model (result in)


Type of alg. 

supervised alg :- data already label . used to predict target attribute/label.

unsupervised alg :- doesn't have label . not have target attribute . find the pattern among the input datasets. problems can be further grouped into clustering.

semi supervised :- combination of supervised and unsupervised alg. eg. photo archive.


Algorithms

1. regression (supervised)

Linear regression :- y=mx+c where m and c are constant. 


adv

simple 

based on math

lower training time

continuous value prediction


dis-adv

not solve complex problem


uses

predicting

stock price prediction

credit assessment


2. Classification 

    used to categorical response values, where the data can separated into     specific "classes".

    Two- class classification : - predicts between two category .

    Multi -class classification :- predicts between several categories. 

    

3. Decision Tree 

    flowchart like structure .

    created upside down with root at top. 


adv.

white box , easy to interpret and explain.

useful to find the most imp attribute.

not affected by outliers , less data cleaning required.

once crated , can provide fast classification .

 

dis adv.

work with discrete value better than continuous value.

requires a lot of prior data. 

limited to one output without probability.

Not great in  regression .


Uses

Astronomy

financial analysis

power system


4. clustering (unsupervised)

    4.1.  K-means 


adv.

no nned for classified input data.

no need to have information about attributes.


dis adv.

results for different for 2 successive runs.

hard to find good means.

specify no of clusters.


uses

E-commerce

credit

super-markets

doc.classification


5. Neural Network

    Artificial Neural Networks (ANN)

    

adv. 

great with even continuous value attributes.

can have 1000 of attributes.  

no need to understand to domain/problem

higher accuracy 


dis. adv. 

Black box , not possible to check results .

 training takes a long time.

sometimes lack of domain knowledge exposed later.


uses

predictive analysis

click stream analysis

fraud detection 

image recognize 

Comments