{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "xjgDyfmL6hB_" }, "source": [ "
\n",
" \n",
"
\n",
" Docs\n",
" |\n",
" GitHub\n",
" |\n",
" Community\n",
"
| \n", " | coding_instruction | \n", "code | \n", "is_well_coded | \n", "
|---|---|---|---|
| 0 | \n", "Create a neural network in Python from scratch that takes a 2D array of numbers as its input and predicts the sum of the numbers in each row. | \n", "import numpy as np\\n\\nclass NeuralNetwork:\\n def __init__(self, input_size):\\n self.weights = np.random.rand(input_size, 1)\\n self.bias = np.random.rand(1, 1)\\n\\n def predict(self, X):\\n return np.dot(X, self.weights) + self.bias\\n\\n# Example Usage:\\ninput_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\\nnn = NeuralNetwork(input_size=input_data.shape[1])\\npredictions = nn.predict(input_data)\\nprint(predictions) | \n", "True | \n", "
| 1 | \n", "Generate a Python program to find the mode of the given list. | \n", "from statistics import mode\\n\\ndef find_mode(input_list):\\n try:\\n result = mode(input_list)\\n return result\\n except StatisticsError:\\n return \"No unique mode found\"\\n\\n# Example Usage:\\ndata_list = [1, 2, 2, 3, 4, 4, 5]\\nmode_result = find_mode(data_list)\\nprint(mode_result) | \n", "True | \n", "
| 2 | \n", "Develop a text summarizer in Python that takes a text input and returns a summary of the text in two sentences. | \n", "from nltk import sent_tokenize\\nfrom nltk.corpus import stopwords\\nfrom nltk.stem import PorterStemmer\\nfrom sklearn.feature_extraction.text import TfidfVectorizer\\nfrom sklearn.metrics.pairwise import cosine_similarity\\n\\ndef text_summarizer(text):\\n sentences = sent_tokenize(text)\\n stop_words = set(stopwords.words('english'))\\n vectorizer = TfidfVectorizer()\\n ps = PorterStemmer()\\n\\n def preprocess_sentence(sentence):\\n words = [ps.stem(word) for word in sentence.split() if word.lower() not in stop_words]\\n return ' '.join(words)\\n\\n preprocessed_sentences = [preprocess_sentence(sentence) for sentence in sentences]\\n\\n try:\\n tfidf_matrix = vectorizer.fit_transform(preprocessed_sentences)\\n sentence_similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)\\n except ValueError:\\n return \"Input text is too short for summarization.\"\\n\\n sentence_scores = [(index, sum(sentence_similarity[index])) for index in range(len(sentences))]\\n sentence_scores = sorted(sentence_scores, key=lambda x: x[1], reverse=True)\\n\\n summary_sentences = [sentences[score[0]] for score in sentence_scores[:2]]\\n summary = ' '.join(summary_sentences)\\n\\n return summary\\n\\n# Example Usage:\\ninput_text = \"Your input text goes here. It should be a longer piece of text for meaningful summarization.\"\\nsummary_result = text_summarizer(input_text)\\nprint(summary_result) | \n", "True | \n", "
| 3 | \n", "Create a program in Python to generate random passwords of 8 characters. | \n", "import random\\nimport string\\n\\ndef generate_random_password():\\n password_length = 8\\n characters = string.ascii_letters + string.digits + string.punctuation\\n password = ''.join(random.choice(characters) for i in range(password_length))\\n return password\\n\\n# Example Usage:\\nrandom_password = generate_random_password()\\nprint(random_password) | \n", "True | \n", "
| 4 | \n", "Design a python program to replace every occurrence of a word in a string with another word. | \n", "def replace_word(string, word, replacement_word):\\n return string.replace(word)\\n\\nprint(replace_word(\"Hello hello world!\", \"hello\", \"welcome\")) | \n", "False | \n", "
| \n", " | coding_instruction | \n", "code | \n", "label | \n", "explanation | \n", "
|---|---|---|---|---|
| 0 | \n", "Write a Python program to optimize a given code and make it run faster. | \n", "def optimize_code(code):\\n optimized_code = \"\"\\n lines = code.splitlines()\\n indentation_level = 0\\n\\n for line in lines:\\n # Remove redundant white space and apply proper indentation\\n optimized_code += \" \" * indentation_level + line.strip() + \"\\n\"\\n\\n # Increment/Decrement indentation level \\n if line.endswith(\":\"):\\n indentation_level += 4\\n else:\\n if line[0] not in [\"#\", \"\"]:\\n indentation_level -= 4\\n indentation_level = max(indentation_level, 0) # prevent it going below 0\\n\\n return optimized_code | \n", "is_bug | \n", "The instruction asks for a Python program that optimizes a given code to make it run faster. However, the provided code does not perform any optimization to make the code run faster. Instead, it reformats the code by removing redundant white spaces and adjusting indentation. This does not improve the performance of the code, it only makes it more readable. Therefore, the code does not meet the requirements of the instruction. | \n", "
| 1 | \n", "Develop a Python program to estimate the value of PI using Leibniz's formula. | \n", "def estimate_pi(n_terms: int) -> float:\\n numerator: float = 4.0\\n denominator: float = 1.0\\n operation: float = 1.0\\n pi: float = 0.0\\n\\n for _ in range(n_terms):\\n pi += operation * (numerator / denominator)\\n denominator += 2.0\\n operation *= -1.0\\n\\n return pi\\n\\n\\nif __name__ == \"__main__\":\\n print(estimate_pi(1000)) | \n", "bug_free | \n", "The provided Python code correctly implements the Leibniz's formula for estimating the value of PI. The function estimate_pi takes an integer n_terms as input, which represents the number of terms in the Leibniz's series to be used for the estimation. The function then initializes four variables: numerator, denominator, operation, and pi. The numerator is always 4.0, the denominator starts at 1.0 and is incremented by 2.0 after each term, the operation alternates between 1.0 and -1.0 to account for the alternating signs in the series, and pi is the running total of the series. The function then enters a loop that runs for n_terms iterations. In each iteration, the function adds the current term (operation * (numerator / denominator)) to pi, increments the denominator by 2.0, and flips the sign of operation. After the loop, the function returns the estimated value of PI. Therefore, the code is bug-free. | \n", "
| 2 | \n", "Construct a python script to display the frequency of each words appearing in a document | \n", "from collections import Counter\\n \\nwith open('text.txt', 'r') as f:\\n data = f.read().lower().split()\\n \\nword_counter = Counter(data)\\n \\n# Print the sorted most common 10 words\\nfor word, count in word_counter.most_common(10):\\n print('{}: {}'.format(word, count)) | \n", "bug_free | \n", "The provided code correctly opens a text file, reads its content, converts it to lowercase, splits it into words, and counts the frequency of each word using the Counter class from the collections module. It then prints the 10 most common words along with their frequencies. The code is correct and efficient in solving the given instruction, which is to display the frequency of each word appearing in a document. However, it assumes that the file 'text.txt' exists and is in the same directory as the script, and it does not handle punctuation or special characters, which could be considered part of the words. But as per the given instruction, it is not explicitly required to handle these cases. | \n", "
| 3 | \n", "Write a machine learning model in Python to predict the classification of the given email. | \n", "import pandas as pd\\nfrom sklearn.model_selection import train_test_split\\nfrom sklearn.feature_extraction.text import TfidfVectorizer\\nfrom sklearn.naive_bayes import MultinomialNB\\n\\n# read in the data\\ndf = pd.read_csv('data.csv')\\n\\n# split the data into training and test sets\\nX_train, X_test, y_train, y_test = train_test_split(df['email_text'], df['class'], test_size=0.2)\\n\\n# create the TF-IDF vectorizer and fit it to our training data\\nvectorizer = TfidfVectorizer(min_df=2)\\nX_train = vectorizer.fit_transform(y_train)\\n\\n# create and train the model\\nmodel = MultinomialNB()\\nmodel.fit(X_train, y_train)\\n\\n# predict the classification of the given email\\nemail_text = \"Holiday alert. Hi everyone, we have planned a holiday on 15th August.\"\\nemail_text_transform = vectorizer.transform([email_text])\\nprediction = model.predict(email_text_transform)\\n\\nprint(\"The predicted classification of the email is: {}\".format(prediction[0])) | \n", "is_bug | \n", "The code is supposed to create a machine learning model to predict the classification of a given email. It reads in data, splits it into training and test sets, creates a TF-IDF vectorizer and fits it to the training data, creates and trains a Multinomial Naive Bayes model, and then uses this model to predict the classification of a given email. However, there is a bug in the code. The TF-IDF vectorizer is fit to 'y_train' which is the target variable, instead of 'X_train' which is the feature variable. The vectorizer should be fit to the email text, not the class labels. | \n", "
| 4 | \n", "Using regex, extend the following python code to remove whitespaces from a string. | \n", "def remove_spaces(user_string):\\n '''This function will remove all the spaces from a string'''\\n user_string = re.sub(r'\\s',' ',user_string)\\n return user_string | \n", "is_bug | \n", "The instruction is to remove all the spaces from a string using regex. The provided code does use regex to find spaces in the string, but it replaces them with a space, not remove them. The correct code should replace the spaces with an empty string ('') to effectively remove them. Therefore, the code does not meet the instruction requirements. | \n", "