Wednesday, December 20, 2023

22BCS306A-Object Oriented Programming with JAVA

1. Syllabus

 

2. Text Book  :  


3. Notes & Presentations :  




































( Materials Courtesy - Thanks to : https://sites.google.com/site/harishbitcse/lecture-notes/22bcs306a-object-oriented-programming-with-java )




Tuesday, November 7, 2023

Big Data Tutorial – Learn Big Data from Scratch and Become a Pro

 

Big Data Tutorial – Learn Big Data from Scratch and Become a Pro

Are you ready to embark on a journey to unlock the secrets of Big Data? Our Big Data tutorial is the perfect starting point for anyone looking to master the art of big data analysis. Imagine being able to process, analyze and visualize vast amounts of data with ease. Imagine being able to turn data into actionable insights that drive business success. With our comprehensive tutorial, you’ll learn how to do just that.

What is Big Data?

Big Data is the data which cannot be managed by using traditional databases. Here is Gartner’s definition: The Data sets with huge volume, generated in different varieties with high velocity is termed as Big Data. These are considered as 3 Vs of Big Data.

These humongous volumes of data can be used to generate advanced patterns & address business problems you wouldn’t have been able to handle earlier.

big data tutorial

Big Data Analytics- BDA - 18CS72 notes

 Big Data Analytics Materials: BDA - 18CS72


Links to download notes:





( Note: Thanks to the original author...!!!   )

ML - Candidate Elimination Algorithm :

 

ML – Candidate Elimination Algorithm


The candidate elimination algorithm incrementally builds the version space given a hypothesis space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does this by updating the general and specific boundary for each new example. 

  • You can consider this as an extended form of the Find-S algorithm.
  • Consider both positive and negative examples.
  • Actually, positive examples are used here as the Find-S algorithm (Basically they are generalizing from the specification).
  • While the negative example is specified in the generalizing form.

Terms Used:  

  • Concept learning: Concept learning is basically the learning task of the machine (Learn by Train data)
  • General Hypothesis: Not Specifying features to learn the machine.
  • G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
  • Specific Hypothesis: Specifying features to learn machine (Specific feature)
  • S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
  • Version Space: It is an intermediate of general hypothesis and Specific hypothesis. It not only just writes one hypothesis but a set of all possible hypotheses based on training data-set.

Advantages of CEA over Find-S:

  1. Improved accuracy: CEA considers both positive and negative examples to generate the hypothesis, which can result in higher accuracy when dealing with noisy or incomplete data.
  2. Flexibility: CEA can handle more complex classification tasks, such as those with multiple classes or non-linear decision boundaries.
  3. More efficient: CEA reduces the number of hypotheses by generating a set of general hypotheses and then eliminating them one by one. This can result in faster processing and improved efficiency.
  4. Better handling of continuous attributes: CEA can handle continuous attributes by creating boundaries for each attribute, which makes it more suitable for a wider range of datasets.

Disadvantages of CEA in comparison with Find-S:

  1. More complex: CEA is a more complex algorithm than Find-S, which may make it more difficult for beginners or those without a strong background in machine learning to use and understand.
  2. Higher memory requirements: CEA requires more memory to store the set of hypotheses and boundaries, which may make it less suitable for memory-constrained environments.
  3. Slower processing for large datasets: CEA may become slower for larger datasets due to the increased number of hypotheses generated.
  4. Higher potential for overfitting: The increased complexity of CEA may make it more prone to overfitting on the training data, especially if the dataset is small or has a high degree of noise.
Code:

import csv

with open ('input.csv') as csvfile:
ex = [tuple(line) for line in csv.reader (csvfile)]

print(ex)

def g_0(n):
return ("?",)*n

def s_0(n):
return ('0',)*n

def more_general(h1,h2):
more_general_parts = []
for x,y in zip(h1,h2):
mg = x == "?" or (x!="0" and (x==y or y=="0"))
more_general_parts.append(mg)
return all (more_general_parts)

def min_generalizations (h,x):
h_new=list(h)
for i in range (len(h)):
if not more_general(h[i:i+1],x[i:i+1]):
h_new[i]='?' if h[i]!='0' else x[i]
return [tuple(h_new)]

def min_specializations(h,domains,x):
results = []
for i in range(len(h)):
if h[i] == "?":
for val in domains[i]:
if x[i]!=val:
h_new=h[:i]+(val,)+h[i+1:]
results.append(h_new)
elif h[i]!="0":
h_new = h[:i]+('0',)+h[i+1:]
results.append(h_new)
return results

def get_domains(ex):
d=[set() for i in ex[0]]
for x in ex:
for i,xi in enumerate(x):
d[i].add(xi)
return [list(sorted(x))for x in d]

def candidate_elimination (ex):
domains = get_domains (ex)[:-1]
G=set([g_0(len(domains))])
S=set([s_0(len(domains))])
i=0
print("G[{0}]:".format(i),G)
print
print("S[{0}]:".format(i),S)
print
for xcx in ex:
i=i+1
x,cx=xcx[:-1],xcx[-1]
if cx=='YES':
G={g for g in G if more_general(g,x)}
S=generalize_S(x,G,S)

else:
S={s for s in S if not more_general(s,x)}
G=specialize_G(x,domains,G,S)
print ("G[{0}]:".format(i),G)
print
print ("S[{0}]:".format(i),S)
print
return

def generalize_S(x,G,S):
S_prev = list(S)
for s in S_prev:
if s not in S:
continue
if not more_general(s,x):
S.remove(s)
Splus = min_generalizations(s,x)
S.update([h for h in Splus if any([more_general(g,h) for g in G])])
S.difference_update([h for h in S if any([more_general(h,h1) for h1 in S if h!=h1])])
return S

def specialize_G(x,domains,G,S):
G_prev = list(G)
for g in G_prev:
if g not in G:
continue
if more_general(g,x):
G.remove(g)
Gminus = min_specializations(g,domains,x)
G.update([h for h in Gminus if any([more_general(h,s) for s in S])])
G.difference_update([h for h in G if any([more_general(g1,h) for g1 in G if h!=g1])])
return G

candidate_elimination (ex)



Input:




Output:





ML - Find S Algorithm

Introduction: 

The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm considers only those positive training example. The find-S algorithm starts with the most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed positive training data. Hence, the Find-S algorithm moves from the most specific hypothesis to the most general hypothesis.


Important Representation : 

  1. indicates that any value is acceptable for the attribute.
  2. specify a single required value ( e.g., Cold ) for the attribute.
  3. ϕindicates that no value is acceptable.
  4. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
  5. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Steps Involved In Find-S : 

  1. Start with the most specific hypothesis. 
    h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
  2. Take the next example and if it is negative, then no changes occur to the hypothesis.
  3. If the example is positive and we find that our initial hypothesis is too specific then we update our current hypothesis to a general condition.
  4. Keep repeating the above steps till all the training examples are complete.
  5. After we have completed all the training examples we will have the final hypothesis when can use to classify the new examples.
Code Snippet:

# coding: utf-8

# # Find-S Algorithm:
# ## Algorithm:
# 1. Initialize h to the most specific hypothesis in H
# 2. For each positive training instance x
#         i. For each attribute constraint a i in h :
#             a. If the constraint a i in h is satisfied by x Then do nothing
#             b. Else replace a i in h by the next more general constraint that is satisfied by x
# 3. Output hypothesis h

# In[1]:


import csv


# ### Read File:
# Load the csv file and asign each row to a data frame
# Also print the row to see the dataset (optional)

# In[ ]:


a=[]
with open('finds.csv') as csfile:
    reader = csv.reader(csfile)
    for row in reader:
        a.append(row)
        print(row)
num_attributes=len(a[0])-1


# 1. The most general hypothesis is represented by:
#     ```['?', '?', '?', '?', '?', '?']```
# 2. The most specific hypothesis is represented by:
#     ```['0', '0', '0', '0', '0', '0']```

# In[ ]:


print("The most general hypothesis:",["?"]*num_attributes)
print("The most specific hypothesis:",["0"]*num_attributes)


# ### Algorithm Implementation:
# Implementation of the above algorithm by updating the hypothesis at each iteration and output the final hypothesis.

# In[ ]:


hypothesis=a[0][:-1]
print("\n Find S: Finding a maximally specific hypothesis")
for i in range (len(a)):
    if a[i][num_attributes] == "Yes":
        for j in range(num_attributes):
            if a[i][j]!=hypothesis[j]:
                hypothesis[j]='?'
    print("The taining example no:",i+1," the hyposthesis is:",hypothesis)
print("\n The maximally specific hypohthesis for training set is")
print(hypothesis)


Output:


['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes']
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No']
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']
The most general hypothesis: ['?', '?', '?', '?', '?', '?']
The most specific hypothesis: ['0', '0', '0', '0', '0', '0']

 Find S: Finding a maximally specific hypothesis
The taining example no: 1  the hyposthesis is: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
The taining example no: 2  the hyposthesis is: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
The taining example no: 3  the hyposthesis is: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
The taining example no: 4  the hyposthesis is: ['Sunny', 'Warm', '?', 'Strong', '?', '?']

 The maximally specific hypohthesis for training set is
['Sunny', 'Warm', '?', 'Strong', '?', '?']


Input data Set:














Tuesday, November 20, 2018

Top Interview questions





CATEGORYWISE  INTERVIEW QUESTIONS




click here:

Top Interview questions

A Complete Guide to Mastering Python

A Complete Guide to Mastering Python


Getting Started with Python
Install Python on your machine now and get started with Python today.



Things to Learn
Choose where to begin, learn at your own pace:
Click here -->>

A Complete Guide to Mastering Java

A Complete Guide to Mastering Java


Things to Learn
Choose where to begin, learn at your own pace:

Know More

Python Tutorial for Beginners – Introduction to Python (Learn Python From A to Z)

Python Tutorial for Beginners – Introduction to Python (Learn Python From A to Z)


Know More


AI & M L Lab - 18CSL76

  Lab programmes: View