Project số 10 - Lớp: PY51SA4L1

Hello các bạn,

Dưới đây là bài tập mình tạo cho các bạn làm vào buổi 10 - đây cũng là bài tập chiếm 50% số điểm đánh giá của trung tâm.

(Các bạn có thể tài về từ Github/OneDrive, nằm ở bài số 10)

Mong các bạn dành thời gian thực hiện và nộp bài đúng thời gian.

Cám ơn các bạn, good luck!

Điểm số
10

Nhận xét của giảng viên

Step 5. Feature Engineering, Phần này mình add thêm info để sau này lỡ có dùng. Tại phần detect outliers rất nhiều.

from collections import Counter

def detect_outliers(df,n,features):
    outlier_indices = []
    
    # iterate over features(columns)
    for col in features:
        # 1st quartile (25%)
        Q1 = np.percentile(df[col],25)
        
        # 3rd quartile (75%)
        Q3 = np.percentile(df[col],75)
        
        # Interquartile range (IQR)
        IQR = Q3 - Q1
        
        # outlier step
        outlier_step = 1.5 * IQR
        
        # Determine a list of indices of outliers fro feature col
        outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step)].index

        # append the found outlier indices for col to the list of outlier indices
        outlier_indices.extend(outlier_list_col)
        
    # select observations containing more than n outliers
    outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(k for k,v in outlier_indices.items() if v > n)
    
    return multiple_outliers

Outliers_to_drop = detect_outliers(pima, 2, cols)

Câu trả lời:

Học Viên: Nguyễn Thị Ngọc Quỳnh

File projects:

Step 5. Feature Engineering, Phần này mình add thêm info để sau này lỡ có dùng. Tại phần detect outliers rất nhiều.

Câu trả lời:

HỌC VIỆN CÔNG NGHỆ MCI

MCI Việt Nam

Chương Trình Đào Tạo

Phân tích dữ liệu

Khoa học dữ liệu

Kĩ sư dữ liệu

Lập trình ứng dụng

Hotline