<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ogunyemi Ezekiel Timilehin]]></title><description><![CDATA[Ogunyemi Ezekiel Timilehin]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev</link><generator>RSS for Node</generator><lastBuildDate>Thu, 25 Jun 2026 23:46:28 GMT</lastBuildDate><atom:link href="https://ogunyemi-ezekiel-timilehin.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[End-to-End Employee Attrition Prediction System]]></title><description><![CDATA[Decision Tree & Random Forest Classification
Employee attrition is one of the most expensive silent risks in any organization. Replacing talent costs money, time, productivity, and morale.
In this project, I built a complete machine learning pipeline...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/end-to-end-employee-attrition-prediction-system</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/end-to-end-employee-attrition-prediction-system</guid><category><![CDATA[Decision Tree]]></category><category><![CDATA[Random Forest]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Tue, 17 Feb 2026 19:00:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771354727827/eefe65e8-65f6-4275-854b-4cf90b90c424.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h3 id="heading-decision-tree-amp-random-forest-classification">Decision Tree &amp; Random Forest Classification</h3>
<p>Employee attrition is one of the most expensive silent risks in any organization. Replacing talent costs money, time, productivity, and morale.</p>
<p>In this project, I built a complete machine learning pipeline to predict whether an employee is likely to leave the company using <strong>Decision Trees</strong> and <strong>Random Forest classifiers</strong>.</p>
<p>The goal is simple but powerful:</p>
<blockquote>
<p>Help HR identify at-risk employees early and design proactive retention strategies.</p>
</blockquote>
<hr />
<h1 id="heading-section-a-data-loading-and-exploration">Section A: Data Loading and Exploration</h1>
<p>We begin by loading the dataset and examining its structure, feature types, and overall distribution.</p>
<p>This step ensures:</p>
<ul>
<li><p>There are no structural inconsistencies</p>
</li>
<li><p>Target distribution is understood</p>
</li>
<li><p>Data types are correctly identified</p>
</li>
</ul>
<hr />
<pre><code class="lang-python"><span class="hljs-comment"># Load dataset</span>
df = pd.read_csv(<span class="hljs-string">"/kaggle/input/week-18/employee_attrition_prediction.csv"</span>)

df.head()
df.shape
df.info()
df.describe()
</code></pre>
<hr />
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771353358814/44767c83-e8ce-4edd-8e86-733b0d0df073.png" alt class="image--center mx-auto" /></p>
<p>At this stage, I verified:</p>
<ul>
<li><p>No critical structural issues</p>
</li>
<li><p>Balanced but realistic attrition distribution</p>
</li>
<li><p>Presence of both numerical and categorical features</p>
</li>
</ul>
<hr />
<h1 id="heading-section-b-exploratory-data-analysis-eda">Section B: Exploratory Data Analysis (EDA)</h1>
<p>Exploratory Data Analysis is where the business story begins to unfold.</p>
<p>The objective here was to understand how each feature relates to employee attrition.</p>
<hr />
<h2 id="heading-distribution-of-numerical-features-by-attrition-status">Distribution of Numerical Features by Attrition Status</h2>
<p>I analyzed how numerical variables differ between employees who left and those who stayed.</p>
<h3 id="heading-insights">Insights</h3>
<p>Clear patterns emerged:</p>
<ul>
<li><p>Employees who left tend to have <strong>lower monthly income</strong></p>
</li>
<li><p>Early-tenure employees (low Years at Company) show higher exit probability</p>
</li>
<li><p>Job satisfaction and work-life balance appear negatively correlated with attrition</p>
</li>
</ul>
<p>This suggests attrition is not random — it is structurally influenced by engagement and compensation variables.</p>
<hr />
<h2 id="heading-categorical-feature-analysis">Categorical Feature Analysis</h2>
<p>Next, I examined categorical features such as:</p>
<ul>
<li><p>Department</p>
</li>
<li><p>Job Role</p>
</li>
<li><p>Overtime</p>
</li>
<li><p>Education Level</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Analyze categorical features by attrition</span>
categorical_cols = [<span class="hljs-string">'Gender'</span>,<span class="hljs-string">'Education_Level'</span>,<span class="hljs-string">'Department'</span>,<span class="hljs-string">'Job_Role'</span>,<span class="hljs-string">'Overtime'</span>]

<span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> categorical_cols:
    plt.figure(figsize=(<span class="hljs-number">6</span>,<span class="hljs-number">4</span>))
    sns.countplot(x=col, hue=<span class="hljs-string">'Left_Company'</span>, data=df)
    plt.xticks(rotation=<span class="hljs-number">45</span>)
    plt.title(<span class="hljs-string">f"<span class="hljs-subst">{col}</span> vs Attrition"</span>)
    plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771353512176/124cdd41-5daa-4d5b-a4c6-72b4bd49c3b2.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-insights-1">Insights</h3>
<p>One variable stood out strongly:</p>
<blockquote>
<p><strong>Overtime</strong></p>
</blockquote>
<p>Employees working overtime were significantly more likely to leave.</p>
<p>Departmental differences were also visible, particularly in:</p>
<ul>
<li><p>Sales</p>
</li>
<li><p>Engineering</p>
</li>
</ul>
<p>This highlights operational pressure and workload imbalance as major drivers.</p>
<hr />
<h2 id="heading-correlation-heatmap">Correlation Heatmap</h2>
<p>To understand numerical relationships, I created a correlation matrix. Create correlation heatmap</p>
<h3 id="heading-insights-2">Insights</h3>
<p>While no extreme multicollinearity was observed, engagement variables such as:</p>
<ul>
<li><p>Job Satisfaction</p>
</li>
<li><p>Work-Life Balance</p>
</li>
</ul>
<p>showed meaningful negative relationships with attrition.</p>
<p>This reinforces the behavioral nature of employee exits.</p>
<hr />
<h2 id="heading-eda-summary-findings">EDA Summary Findings</h2>
<ul>
<li><p>Overtime is a dominant risk factor.</p>
</li>
<li><p>Lower job satisfaction significantly increases exit likelihood.</p>
</li>
<li><p>Early-career employees are more vulnerable.</p>
</li>
<li><p>Income plays a stabilizing role.</p>
</li>
<li><p>Attrition appears organizational and behavioral rather than demographic.</p>
</li>
</ul>
<hr />
<h1 id="heading-section-c-data-preprocessing">Section C: Data Preprocessing</h1>
<p>Before modeling, categorical variables were encoded appropriately.</p>
<p>Tree-based models do not require feature scaling because:</p>
<ul>
<li><p>They split on thresholds</p>
</li>
<li><p>They are not distance-based</p>
</li>
<li><p>They are invariant to monotonic transformations</p>
</li>
</ul>
<hr />
<h2 id="heading-encoding-categorical-variables">Encoding Categorical Variables</h2>
<ul>
<li><p>Gender → Binary encoding</p>
</li>
<li><p>Education Level → Ordinal encoding</p>
</li>
<li><p>Department → Encoded</p>
</li>
<li><p>Job Role → Label encoded</p>
</li>
<li><p>Overtime → Binary encoding</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Handle categorical variables</span>
df[<span class="hljs-string">'Gender'</span>] = df[<span class="hljs-string">'Gender'</span>].map({<span class="hljs-string">'Male'</span>:<span class="hljs-number">1</span>, <span class="hljs-string">'Female'</span>:<span class="hljs-number">0</span>})

<span class="hljs-comment"># Education</span>
edu_map = {<span class="hljs-string">'Bachelor'</span>:<span class="hljs-number">0</span>, <span class="hljs-string">'Master'</span>:<span class="hljs-number">1</span>, <span class="hljs-string">'PhD'</span>:<span class="hljs-number">2</span>}
df[<span class="hljs-string">'Education_Level'</span>] = df[<span class="hljs-string">'Education_Level'</span>].map(edu_map)

<span class="hljs-comment"># Overtime</span>
df[<span class="hljs-string">'Overtime'</span>] = df[<span class="hljs-string">'Overtime'</span>].map({<span class="hljs-string">'Yes'</span>:<span class="hljs-number">1</span>, <span class="hljs-string">'No'</span>:<span class="hljs-number">0</span>})

<span class="hljs-comment"># Encode Department and Job_Role</span>
le_dept = LabelEncoder()
df[<span class="hljs-string">'Department'</span>] = le_dept.fit_transform(df[<span class="hljs-string">'Department'</span>])

le_role = LabelEncoder()
df[<span class="hljs-string">'Job_Role'</span>] = le_role.fit_transform(df[<span class="hljs-string">'Job_Role'</span>])
</code></pre>
<hr />
<h2 id="heading-creating-feature-matrix-and-target-vector">Creating Feature Matrix and Target Vector</h2>
<p>Features:<br />All columns except <code>Employee_ID</code> and <code>Left_Company</code></p>
<p>Target:<br /><code>Left_Company</code></p>
<pre><code class="lang-python"><span class="hljs-comment"># Create X and y</span>
X = df.drop(columns=[<span class="hljs-string">'Employee_ID'</span>,<span class="hljs-string">'Left_Company'</span>])
y = df[<span class="hljs-string">'Left_Company'</span>]
</code></pre>
<hr />
<h2 id="heading-train-test-split">Train-Test Split</h2>
<ul>
<li><p>80% training</p>
</li>
<li><p>20% testing</p>
</li>
<li><p>random_state = 42</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Train-test split</span>
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>
)
</code></pre>
<hr />
<h1 id="heading-section-d-model-building">Section D: Model Building</h1>
<p>Now we move into modeling.</p>
<hr />
<h1 id="heading-d1-decision-tree-classifier">D1: Decision Tree Classifier</h1>
<p>I first built a Decision Tree using:</p>
<ul>
<li><p>criterion = 'entropy'</p>
</li>
<li><p>random_state = 0</p>
</li>
</ul>
<p>Then experimented with different max_depth values:</p>
<p>3, 5, 7, 10, None</p>
<hr />
<pre><code class="lang-python"><span class="hljs-comment"># Test different max_depth values</span>
depths = [<span class="hljs-number">3</span>,<span class="hljs-number">5</span>,<span class="hljs-number">7</span>,<span class="hljs-number">10</span>,<span class="hljs-literal">None</span>]
train_acc = []
test_acc = []

<span class="hljs-keyword">for</span> d <span class="hljs-keyword">in</span> depths:
    dt = DecisionTreeClassifier(
        criterion=<span class="hljs-string">'entropy'</span>,
        max_depth=d,
        random_state=<span class="hljs-number">0</span>
    )

    dt.fit(X_train, y_train)

    train_acc.append(accuracy_score(y_train, dt.predict(X_train)))
    test_acc.append(accuracy_score(y_test, dt.predict(X_test)))
</code></pre>
<hr />
<p>After experimentation, I selected the optimal depth and trained the final model.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Final Decision Tree</span>
dt_final = DecisionTreeClassifier(
    criterion=<span class="hljs-string">'entropy'</span>,
    max_depth=<span class="hljs-number">5</span>,  <span class="hljs-comment"># replace with optimal</span>
    random_state=<span class="hljs-number">0</span>
)

dt_final.fit(X_train, y_train)
dt_pred = dt_final.predict(X_test)

print(classification_report(y_test, dt_pred))
</code></pre>
<hr />
<h3 id="heading-decision-tree-results">Decision Tree Results</h3>
<pre><code class="lang-python">precision    recall  f1-score   support

<span class="hljs-number">0</span>       <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>        <span class="hljs-number">87</span>
<span class="hljs-number">1</span>       <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>        <span class="hljs-number">13</span>

accuracy                           <span class="hljs-number">1.00</span>       <span class="hljs-number">100</span>
</code></pre>
<p>The model perfectly classified all test samples.</p>
<p>Confusion Matrix:</p>
<p>Zero false positives.<br />Zero false negatives.</p>
<hr />
<h1 id="heading-d2-random-forest-classifier">D2: Random Forest Classifier</h1>
<p>Next, I implemented Random Forest with:</p>
<ul>
<li><p>criterion = 'entropy'</p>
</li>
<li><p>random_state = 0</p>
</li>
</ul>
<p>I experimented with n_estimators values:<br />10, 50, 100, 150</p>
<hr />
<pre><code class="lang-python"><span class="hljs-comment"># Test different n_estimators</span>
n_values = [<span class="hljs-number">10</span>,<span class="hljs-number">50</span>,<span class="hljs-number">100</span>,<span class="hljs-number">150</span>]
rf_train = []
rf_test = []

<span class="hljs-keyword">for</span> n <span class="hljs-keyword">in</span> n_values:
    rf = RandomForestClassifier(
        n_estimators=n,
        criterion=<span class="hljs-string">'entropy'</span>,
        random_state=<span class="hljs-number">0</span>
    )

    rf.fit(X_train, y_train)

    rf_train.append(accuracy_score(y_train, rf.predict(X_train)))
    rf_test.append(accuracy_score(y_test, rf.predict(X_test)))
</code></pre>
<hr />
<pre><code class="lang-python"><span class="hljs-comment"># Plot n_estimators vs accuracy</span>
plt.plot(n_values, rf_train, label=<span class="hljs-string">'Train Accuracy'</span>)
plt.plot(n_values, rf_test, label=<span class="hljs-string">'Test Accuracy'</span>)
plt.legend()
plt.xlabel(<span class="hljs-string">"Number of Trees"</span>)
plt.ylabel(<span class="hljs-string">"Accuracy"</span>)
plt.show()
</code></pre>
<hr />
<p>Final model trained using optimal number of estimators.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Final Random Forest</span>
rf_final = RandomForestClassifier(
    n_estimators=<span class="hljs-number">100</span>,  <span class="hljs-comment"># replace with optimal</span>
    criterion=<span class="hljs-string">'entropy'</span>,
    random_state=<span class="hljs-number">0</span>
)

rf_final.fit(X_train, y_train)
rf_pred = rf_final.predict(X_test)

print(classification_report(y_test, rf_pred))
</code></pre>
<hr />
<h3 id="heading-random-forest-results">Random Forest Results</h3>
<pre><code class="lang-python">precision    recall  f1-score   support

<span class="hljs-number">0</span>       <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>        <span class="hljs-number">87</span>
<span class="hljs-number">1</span>       <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>      <span class="hljs-number">1.00</span>        <span class="hljs-number">13</span>

accuracy                           <span class="hljs-number">1.00</span>       <span class="hljs-number">100</span>
</code></pre>
<p>Confusion Matrix:</p>
<p>Identical performance to Decision Tree.</p>
<pre><code class="lang-python">Decision Tree CM
[[<span class="hljs-number">87</span>  <span class="hljs-number">0</span>]
 [ <span class="hljs-number">0</span> <span class="hljs-number">13</span>]]
Random Forest CM
[[<span class="hljs-number">87</span>  <span class="hljs-number">0</span>]
 [ <span class="hljs-number">0</span> <span class="hljs-number">13</span>]]
</code></pre>
<hr />
<h1 id="heading-d3-feature-importance-analysis">D3: Feature Importance Analysis</h1>
<p>Tree-based models provide interpretability through feature importance scores.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Extract and visualize feature importance</span>
<span class="hljs-comment"># Extract and visualize feature importance</span>
importances = rf_final.feature_importances_
features = X.columns

importance_df = pd.DataFrame({
    <span class="hljs-string">'Feature'</span>: features,
    <span class="hljs-string">'Importance'</span>: importances
}).sort_values(by=<span class="hljs-string">'Importance'</span>, ascending=<span class="hljs-literal">False</span>)

importance_df.head()
</code></pre>
<hr />
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771354159457/26a75ce5-c949-4550-be26-d10b7a0bdea7.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771354096401/dcd3b0bd-ff1a-4e90-b948-3539e0ed09ee.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-top-influential-features">Top Influential Features</h2>
<ol>
<li><p>Overtime</p>
</li>
<li><p>Job Satisfaction</p>
</li>
<li><p>Work-Life Balance</p>
</li>
<li><p>Monthly Income</p>
</li>
<li><p>Years at Company</p>
</li>
<li><p>Department</p>
</li>
<li><p>Job Role</p>
</li>
</ol>
<p>Overtime consistently ranked as the strongest predictor.</p>
<hr />
<h1 id="heading-section-e-model-comparison-and-selection">Section E: Model Comparison and Selection</h1>
<hr />
<h2 id="heading-comparison-table">Comparison Table</h2>
<pre><code class="lang-python"><span class="hljs-comment"># Create comparison table</span>
models = {
    <span class="hljs-string">"Decision Tree"</span>: dt_pred,
    <span class="hljs-string">"Random Forest"</span>: rf_pred
}

<span class="hljs-keyword">for</span> name, pred <span class="hljs-keyword">in</span> models.items():
    print(name)
    print(classification_report(y_test, pred))
</code></pre>
<h1 id="heading-model-selection-assessment-most-important-section">Model Selection Assessment (Most Important Section)</h1>
<p>Both models achieved:</p>
<ul>
<li><p>Accuracy: 1.00</p>
</li>
<li><p>Precision: 1.00</p>
</li>
<li><p>Recall: 1.00</p>
</li>
<li><p>F1-Score: 1.00</p>
</li>
</ul>
<p>However, identical performance does not automatically imply identical quality.</p>
<h3 id="heading-why-random-forest-is-preferred">Why Random Forest is Preferred</h3>
<p>Although Decision Tree achieved perfect accuracy, it is highly prone to overfitting, especially in moderate-sized datasets.</p>
<p>Random Forest:</p>
<ul>
<li><p>Reduces variance via ensemble averaging</p>
</li>
<li><p>Handles nonlinear patterns more robustly</p>
</li>
<li><p>Generalizes better to unseen data</p>
</li>
<li><p>Is less sensitive to minor dataset changes</p>
</li>
</ul>
<p>In business-critical systems such as attrition prediction — where false negatives are costly — stability matters more than simplicity.</p>
<p>Therefore:</p>
<blockquote>
<p>Random Forest is the recommended deployment model.</p>
</blockquote>
<hr />
<h1 id="heading-section-f-final-report">Section F: Final Report</h1>
<hr />
<h1 id="heading-summary-of-findings">Summary of Findings</h1>
<p>The analysis shows that employee attrition is primarily driven by:</p>
<ul>
<li><p>Excessive overtime</p>
</li>
<li><p>Low job satisfaction</p>
</li>
<li><p>Poor work-life balance</p>
</li>
<li><p>Lower income</p>
</li>
<li><p>Short tenure</p>
</li>
</ul>
<p>Attrition risk is concentrated among early-career employees working overtime in high-pressure departments.</p>
<p>Both Decision Tree and Random Forest achieved perfect classification performance. However, Random Forest offers superior theoretical generalization and lower overfitting risk.</p>
<hr />
<h1 id="heading-business-recommendations">Business Recommendations</h1>
<h3 id="heading-workload-management">Workload Management</h3>
<ul>
<li><p>Monitor overtime hours aggressively</p>
</li>
<li><p>Implement burnout prevention policies</p>
</li>
<li><p>Balance departmental workload distribution</p>
</li>
</ul>
<h3 id="heading-engagement-enhancement">Engagement Enhancement</h3>
<ul>
<li><p>Quarterly satisfaction surveys</p>
</li>
<li><p>Managerial coaching programs</p>
</li>
<li><p>Clear career progression structures</p>
</li>
</ul>
<h3 id="heading-compensation-review">Compensation Review</h3>
<ul>
<li><p>Benchmark salaries</p>
</li>
<li><p>Introduce performance-based incentives</p>
</li>
<li><p>Target retention bonuses for high-risk roles</p>
</li>
</ul>
<h3 id="heading-early-career-programs">Early Career Programs</h3>
<ul>
<li><p>Structured onboarding</p>
</li>
<li><p>Mentorship systems</p>
</li>
<li><p>First 3-year retention strategy</p>
</li>
</ul>
<p>Retention strategies should prioritize employees flagged as high-risk by the model.</p>
<hr />
<h1 id="heading-technical-recommendations">Technical Recommendations</h1>
<ul>
<li><p>Deploy Random Forest in production</p>
</li>
<li><p>Monitor recall for attrition class</p>
</li>
<li><p>Retrain model quarterly</p>
</li>
<li><p>Implement drift detection mechanisms</p>
</li>
</ul>
<p>Compared to:</p>
<p>KNN:</p>
<ul>
<li>Distance-based, less interpretable</li>
</ul>
<p>SVM:</p>
<ul>
<li>Strong but less explainable</li>
</ul>
<p>Tree-based methods:</p>
<ul>
<li><p>Naturally handle nonlinearities</p>
</li>
<li><p>Provide feature importance</p>
</li>
<li><p>Require minimal preprocessing</p>
</li>
</ul>
<p>For HR datasets, tree-based models are particularly suitable.</p>
<hr />
<h1 id="heading-final-conclusion">Final Conclusion</h1>
<p>This project demonstrates the successful development of a full machine learning pipeline for employee attrition prediction.</p>
<p>While both models achieved perfect performance, Random Forest is the recommended deployment model due to ensemble stability and reduced overfitting risk.</p>
<p>Most importantly, the analysis reveals that attrition is driven by workload intensity, engagement levels, compensation dissatisfaction, and early tenure vulnerability.</p>
<p>A data-driven HR strategy focused on these areas can significantly reduce employee turnover and organizational risk.</p>
<p>Photo credit : Pinterest</p>
]]></content:encoded></item><item><title><![CDATA[Customer Churn Prediction Case Study]]></title><description><![CDATA[End-to-End Machine Learning Project with Business Impact

Project Overview
Customer churn is one of the biggest challenges for subscription-based businesses. For telecom companies in particular, losing a customer often costs significantly more than r...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/customer-churn-prediction-case-study</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/customer-churn-prediction-case-study</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Predicting Customer Churn]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Fri, 30 Jan 2026 17:49:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769795198455/2db9b582-3a23-4436-89fb-cd7b16ff6d35.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>End-to-End Machine Learning Project with Business Impact</em></p>
<hr />
<h2 id="heading-project-overview"><strong>Project Overview</strong></h2>
<p>Customer churn is one of the biggest challenges for subscription-based businesses. For telecom companies in particular, losing a customer often costs significantly more than retaining one.</p>
<p>In this case study, I built an end-to-end machine learning solution to predict customer churn and translate the results into actionable retention strategies. The focus was not just on model performance, but on interpretability, decision-making, and real business impact.</p>
<hr />
<h2 id="heading-problem-statement"><strong>Problem Statement</strong></h2>
<p>A telecommunications company was experiencing increasing customer churn and needed data-driven insights to support its retention efforts.</p>
<p>The business wanted answers to three key questions:</p>
<ol>
<li><p>Which customers are likely to churn?</p>
</li>
<li><p>What factors are driving churn?</p>
</li>
<li><p>How can the retention team act on these insights to reduce customer loss?</p>
</li>
</ol>
<hr />
<h2 id="heading-objective"><strong>Objective</strong></h2>
<p>The goal of this project was to:</p>
<ul>
<li><p>Build a churn prediction model</p>
</li>
<li><p>Identify key churn drivers</p>
</li>
<li><p>Recommend practical, data-backed retention strategies</p>
</li>
<li><p>Design a solution that could realistically be used by business stakeholders</p>
</li>
</ul>
<hr />
<h2 id="heading-dataset-description"><strong>Dataset Description</strong></h2>
<p>The dataset contains <strong>500 customer records</strong> with <strong>19 features</strong>, covering customer demographics, billing, service usage, and support interactions.</p>
<h3 id="heading-feature-categories">Feature categories</h3>
<p><strong>Demographics</strong></p>
<ul>
<li><p>Age</p>
</li>
<li><p>Gender</p>
</li>
</ul>
<p><strong>Account and contract details</strong></p>
<ul>
<li><p>Tenure</p>
</li>
<li><p>Contract type</p>
</li>
<li><p>Payment method</p>
</li>
</ul>
<p><strong>Billing and usage</strong></p>
<ul>
<li><p>Monthly charges</p>
</li>
<li><p>Total charges</p>
</li>
<li><p>Internet service</p>
</li>
<li><p>Phone service</p>
</li>
</ul>
<p><strong>Customer experience</strong></p>
<ul>
<li><p>Support calls</p>
</li>
<li><p>Customer satisfaction score</p>
</li>
</ul>
<p><strong>Service add-ons</strong></p>
<ul>
<li><p>Streaming TV</p>
</li>
<li><p>Streaming movies</p>
</li>
<li><p>Online security</p>
</li>
<li><p>Tech support</p>
</li>
</ul>
<p><strong>Target variable</strong></p>
<ul>
<li>Churn (0 = active, 1 = churned)</li>
</ul>
<hr />
<h2 id="heading-approach-and-methodology"><strong>Approach and Methodology</strong></h2>
<p>I approached the project in structured phases to mirror a real-world data science workflow.</p>
<h3 id="heading-1-data-understanding-and-cleaning">1. Data understanding and cleaning</h3>
<ul>
<li><p>Inspected data types and distributions</p>
</li>
<li><p>Checked for missing values and inconsistencies</p>
</li>
<li><p>Ensured the dataset was suitable for modeling</p>
<pre><code class="lang-python">  <span class="hljs-comment">#Load dataset </span>

  df = pd.read_csv(<span class="hljs-string">"/kaggle/input/week-16-regression-3/customer_churn_prediction.csv"</span>)
  df.head()
</code></pre>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769794436275/f4035269-611a-431c-b1a4-c1c1df095c08.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python"><span class="hljs-comment">#Data Overview</span>

df.info()
df.describe()
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment">#check missing values</span>
df.isnull().sum()
</code></pre>
<h3 id="heading-2-exploratory-data-analysis-eda">2. Exploratory Data Analysis (EDA)</h3>
<p>EDA was used to understand customer behavior and uncover churn patterns.</p>
<p>Key findings included:</p>
<ul>
<li><p><strong>EDA Summary (Key Insights):</strong></p>
<ul>
<li><p><strong>Churn distribution is fairly balanced</strong>, with slightly more non-churn customers than churned ones. This means the dataset is suitable for classification without severe class imbalance.</p>
</li>
<li><p><strong>Age shows a mild relationship with churn.</strong> Customers who churn tend to be slightly older on average, though the overlap is large, so age alone is not a strong predictor.</p>
</li>
<li><p><strong>Monthly charges are higher for churned customers.</strong> Customers who left generally pay more per month, suggesting price sensitivity is a major factor influencing churn.</p>
</li>
<li><p><strong>Tenure is clearly related to churn.</strong> Customers with shorter tenure are more likely to churn, while long-term customers tend to stay, indicating loyalty increases over time.</p>
</li>
<li><p><strong>Overall, financial and engagement factors matter more than demographics.</strong> Monthly charges and tenure show stronger separation between churn and non-churn compared to age.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-3-feature-engineering-and-preprocessing">3. Feature engineering and preprocessing</h3>
<ul>
<li><p>Encoded categorical variables</p>
</li>
<li><p>Scaled numerical features</p>
</li>
<li><p>Split the data into training and test sets to evaluate generalization</p>
<pre><code class="lang-python">  categorical_cols = [
      <span class="hljs-string">"Gender"</span>, <span class="hljs-string">"Contract_Type"</span>, <span class="hljs-string">"Internet_Service"</span>, <span class="hljs-string">"Payment_Method"</span>
  ]

  le = LabelEncoder()
  <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> categorical_cols:
      df[col] = le.fit_transform(df[col])
</code></pre>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment">#Define Features and Target</span>
X = df.drop(columns=[<span class="hljs-string">"Customer_ID"</span>, <span class="hljs-string">"Churn"</span>])
y = df[<span class="hljs-string">"Churn"</span>]
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment">#Train-Test split</span>

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=<span class="hljs-number">0.25</span>, random_state=<span class="hljs-number">42</span>, stratify=y
)
</code></pre>
<hr />
<h2 id="heading-modeling-and-evaluation"><strong>Modeling and Evaluation</strong></h2>
<p>I trained and evaluated two classification models:</p>
<ul>
<li><p>Logistic Regression</p>
</li>
<li><p>Random Forest Classifier</p>
</li>
</ul>
<p>Because churn prediction has <strong>asymmetric business costs</strong>, I evaluated models using accuracy, precision, recall, F1 score, and AUC.</p>
<h3 id="heading-model-performance-summary">Model performance summary</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Model</td><td>Accuracy</td><td>Recall</td><td>F1 Score</td><td>AUC</td></tr>
</thead>
<tbody>
<tr>
<td>Logistic Regression</td><td>0.576</td><td>0.544</td><td>0.539</td><td>0.586</td></tr>
<tr>
<td>Random Forest</td><td>0.528</td><td>0.491</td><td>0.487</td><td>0.578</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769794780670/00a4116f-d679-4053-a2bb-5c5e9ab9d249.png" alt class="image--center mx-auto" /></p>
<p>Logistic Regression consistently outperformed Random Forest across all metrics.</p>
<hr />
<h2 id="heading-model-selection-rationale"><strong>Model Selection Rationale</strong></h2>
<p>Logistic Regression was selected for deployment for three main reasons:</p>
<ol>
<li><p>Better overall performance and higher recall, which is critical for identifying at-risk customers</p>
</li>
<li><p>Strong interpretability, allowing business stakeholders to understand why customers churn</p>
</li>
<li><p>Better alignment with business needs, where missing a churner is more costly than a false alarm</p>
</li>
</ol>
<p>To further improve recall, I recommended lowering the probability threshold from 0.5 to approximately 0.4.</p>
<hr />
<h2 id="heading-key-churn-drivers"><strong>Key Churn Drivers</strong></h2>
<p>Using feature importance analysis, the strongest churn drivers were identified as:</p>
<ul>
<li><p>Monthly charges</p>
</li>
<li><p>Tenure</p>
</li>
<li><p>Total charges</p>
</li>
<li><p>Customer satisfaction score</p>
</li>
<li><p>Contract type</p>
</li>
<li><p>Support call frequency</p>
</li>
<li><p>Age</p>
</li>
</ul>
<p>These results show that churn is driven primarily by <strong>pricing, customer experience, and relationship duration</strong> rather than static demographic attributes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769794818811/65141ff1-274c-4a7b-ad63-d6e6e4e87122.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-business-recommendations"><strong>Business Recommendations</strong></h2>
<p>Based on the insights from the model and EDA, I proposed the following actions:</p>
<ul>
<li><p>Offer targeted discounts or flexible pricing to high-billing customers</p>
</li>
<li><p>Strengthen onboarding and engagement during the first three to six months</p>
</li>
<li><p>Trigger proactive outreach when customer satisfaction scores drop</p>
</li>
<li><p>Improve support quality for customers with frequent service calls</p>
</li>
<li><p>Encourage long-term contracts through incentives</p>
</li>
<li><p>Personalize retention strategies by age group</p>
</li>
</ul>
<p>These recommendations directly link model insights to measurable business actions.</p>
<hr />
<h2 id="heading-implementation-strategy"><strong>Implementation Strategy</strong></h2>
<p>To ensure the solution remains effective in production, I recommended:</p>
<ul>
<li><p>Retraining the model every three to six months</p>
</li>
<li><p>Monitoring recall, AUC, churn rate, and false negative rate</p>
</li>
<li><p>Measuring business impact through retention campaign success and customer lifetime value</p>
</li>
</ul>
<hr />
<h2 id="heading-limitations-and-future-improvements"><strong>Limitations and Future Improvements</strong></h2>
<p>While the model provides useful insights, its predictive power is moderate.</p>
<p>Future improvements could include:</p>
<ul>
<li><p>Adding time-series and behavioral usage data</p>
</li>
<li><p>Incorporating complaint resolution history</p>
</li>
<li><p>Testing advanced models such as Gradient Boosting or XGBoost</p>
</li>
<li><p>Addressing potential class imbalance</p>
</li>
<li><p>Integrating near real-time customer activity</p>
</li>
</ul>
<hr />
<h2 id="heading-results-and-impact"><strong>Results and Impact</strong></h2>
<p>Although this was an offline project, the expected business impact includes:</p>
<ul>
<li><p>Earlier identification of at-risk customers</p>
</li>
<li><p>More targeted and cost-effective retention campaigns</p>
</li>
<li><p>Reduced customer churn and improved customer lifetime value</p>
</li>
</ul>
<p>The project demonstrates how even moderately performing models can deliver meaningful value when combined with strong business understanding.</p>
<hr />
<h2 id="heading-key-skills-demonstrated"><strong>Key Skills Demonstrated</strong></h2>
<ul>
<li><p>Business problem framing</p>
</li>
<li><p>Exploratory data analysis</p>
</li>
<li><p>Feature engineering and preprocessing</p>
</li>
<li><p>Classification modeling</p>
</li>
<li><p>Model evaluation and selection</p>
</li>
<li><p>Translating machine learning outputs into business strategy</p>
</li>
<li><p>Communicating insights to non-technical stakeholders</p>
</li>
</ul>
<hr />
<h2 id="heading-final-reflection"><strong>Final Reflection</strong></h2>
<p>This case study highlights an important principle of applied data science: models do not need to be perfect to be useful. What matters most is understanding the problem, interpreting results correctly, and turning insights into action.</p>
<p>This project showcases my ability to think beyond metrics and build solutions that support real business decisions.</p>
<hr />
<p>image credit : Pinterest</p>
]]></content:encoded></item><item><title><![CDATA[Building a Real-World Car Price Prediction System with Machine Learning]]></title><description><![CDATA[Pricing used cars accurately is a major challenge in the automotive industry. Overpricing leads to slow sales, while underpricing reduces profit margins. In this project, I built an end-to-end Car Price Prediction System that uses machine learning to...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/building-a-real-world-car-price-prediction-system-with-machine-learning</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/building-a-real-world-car-price-prediction-system-with-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[#Regression]]></category><category><![CDATA[business analytics]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Wed, 21 Jan 2026 16:28:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769012742548/f9535abc-2ec7-465e-a36b-6a73e7e9b571.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>Pricing used cars accurately is a major challenge in the automotive industry. Overpricing leads to slow sales, while underpricing reduces profit margins. In this project, I built an end-to-end <strong>Car Price Prediction System</strong> that uses machine learning to estimate fair market prices for used cars based on their features.</p>
<p>This project applies concepts from <strong>Weeks 14–15 of machine learning</strong>, covering data preprocessing, exploratory data analysis, regression models, evaluation techniques, and real-world business interpretation.</p>
<hr />
<h2 id="heading-project-objective">Project Objective</h2>
<p>The goal of this project is to build an intelligent pricing system that helps an automotive company:</p>
<ul>
<li><p>Price vehicles competitively</p>
</li>
<li><p>Identify undervalued cars for purchase</p>
</li>
<li><p>Maximize profit margins</p>
</li>
<li><p>Provide instant price estimates to customers</p>
</li>
</ul>
<p>The system predicts car prices using structured vehicle data such as brand, mileage, engine size, and service history.</p>
<hr />
<h2 id="heading-dataset-overview">Dataset Overview</h2>
<p>The dataset used is:</p>
<p><code>assessment_car_price_prediction.csv</code></p>
<p>It contains <strong>200 records of used cars</strong> with a mix of numerical and categorical features.</p>
<h3 id="heading-key-features">Key Features</h3>
<ul>
<li><p>Brand</p>
</li>
<li><p>Year</p>
</li>
<li><p>Mileage</p>
</li>
<li><p>Engine Size</p>
</li>
<li><p>Horsepower</p>
</li>
<li><p>Fuel Type</p>
</li>
<li><p>Transmission</p>
</li>
<li><p>Previous Owners</p>
</li>
<li><p>Accident History</p>
</li>
<li><p>Service Records</p>
</li>
</ul>
<h3 id="heading-target-variable">Target Variable</h3>
<ul>
<li><strong>Price (USD)</strong></li>
</ul>
<hr />
<h2 id="heading-phase-1-data-understanding-amp-preprocessing">Phase 1: Data Understanding &amp; Preprocessing</h2>
<h3 id="heading-11-data-loading-and-initial-exploration">1.1 Data Loading and Initial Exploration</h3>
<p>The first step was to load the dataset and inspect its structure. This included checking the number of rows and columns, reviewing data types, examining sample records, and confirming data completeness.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Data Loading and Exploration This section loads the dataset and performs initial inspection to understand its structure, data types, and completeness.</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Load dataset</span>
df = pd.read_csv(<span class="hljs-string">"/kaggle/input/week15-dataset/assessment_car_price_prediction.csv"</span>)

<span class="hljs-comment"># Basic information</span>
df.shape, df.info()

<span class="hljs-comment"># View data</span>
df.head(), df.tail()

<span class="hljs-comment"># Statistical summary</span>
df.describe()

<span class="hljs-comment"># Missing values</span>
df.isnull().sum()
</code></pre>
<p>This inspection revealed that:</p>
<ul>
<li><p>The dataset contains <strong>200 rows and 11 columns</strong></p>
</li>
<li><p>There are <strong>no missing values</strong></p>
</li>
<li><p>Five features are categorical, while the rest are numerical</p>
</li>
</ul>
<p>This clean structure allowed us to proceed directly to exploratory analysis.</p>
<hr />
<h3 id="heading-12-exploratory-data-analysis-eda">1.2 Exploratory Data Analysis (EDA)</h3>
<p>EDA helps uncover patterns, trends, and anomalies that influence pricing behavior.</p>
<p>Key visual analyses performed include:</p>
<ul>
<li><p>Distribution of car prices</p>
</li>
<li><p>Price variation by brand</p>
</li>
<li><p>Price variation by fuel type</p>
</li>
<li><p>Correlation analysis among numerical features</p>
</li>
<li><p>Relationship between mileage and price</p>
</li>
<li><p>Relationship between vehicle year and price</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769011916459/2026f939-6138-4b69-ae43-720bf77cfa24.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769011967485/952f08a1-81cb-4a1e-8408-4fe44ec5a189.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769011988848/7d33c0ba-da9c-4da3-8d07-0ffdd48fd760.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769012018342/4c1876d8-a225-4db1-866b-fe08ca648e1c.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769012048535/71004bb7-f4f5-4667-9d4f-5c8ecda6d43e.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769012067119/b2a27bcc-d126-43b3-babc-9a76389ad389.png" alt class="image--center mx-auto" /></p>
<h4 id="heading-key-insights-from-eda">Key Insights from EDA</h4>
<ul>
<li><p>Car prices are right-skewed, indicating more affordable cars than luxury ones.</p>
</li>
<li><p>Premium brands command higher median prices.</p>
</li>
<li><p>Mileage shows a strong negative relationship with price.</p>
</li>
<li><p>Newer vehicles consistently sell for higher prices.</p>
</li>
</ul>
<hr />
<h3 id="heading-13-data-preprocessing">1.3 Data Preprocessing</h3>
<p>Before modeling, the data was transformed into a machine-learning-ready format.</p>
<h4 id="heading-categorical-encoding">Categorical Encoding</h4>
<ul>
<li><p>Brand, Fuel Type, and Transmission were one-hot encoded.</p>
</li>
<li><p>Accident History and Service Records were label encoded (Yes = 1, No = 0).</p>
</li>
</ul>
<h4 id="heading-feature-scaling">Feature Scaling</h4>
<p>Numerical features such as mileage and horsepower were standardized to ensure fair contribution during training.</p>
<h4 id="heading-train-test-split">Train-Test Split</h4>
<p>The dataset was split into:</p>
<ul>
<li><p><strong>70% training data</strong></p>
</li>
<li><p><strong>30% testing data</strong><br />  with <code>random_state = 42</code> for reproducibility.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Data Preprocessing (Categorical Encoding)</span>
<span class="hljs-keyword">import</span> time
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> OneHotEncoder, LabelEncoder
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> sklearn.impute <span class="hljs-keyword">import</span> SimpleImputer
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler

<span class="hljs-comment"># Label encoding</span>
df[<span class="hljs-string">'Accident_History'</span>] = df[<span class="hljs-string">'Accident_History'</span>].map({<span class="hljs-string">'Yes'</span>:<span class="hljs-number">1</span>, <span class="hljs-string">'No'</span>:<span class="hljs-number">0</span>})
df[<span class="hljs-string">'Service_Records'</span>] = df[<span class="hljs-string">'Service_Records'</span>].map({<span class="hljs-string">'Yes'</span>:<span class="hljs-number">1</span>, <span class="hljs-string">'No'</span>:<span class="hljs-number">0</span>})

<span class="hljs-comment"># Features</span>
X = df.drop(<span class="hljs-string">'Price'</span>, axis=<span class="hljs-number">1</span>)
y = df[<span class="hljs-string">'Price'</span>]

categorical = [<span class="hljs-string">'Brand'</span>, <span class="hljs-string">'Fuel_Type'</span>, <span class="hljs-string">'Transmission'</span>]
numerical = [<span class="hljs-string">'Year'</span>, <span class="hljs-string">'Mileage'</span>, <span class="hljs-string">'Engine_Size'</span>, <span class="hljs-string">'Horsepower'</span>, <span class="hljs-string">'Previous_Owners'</span>,
             <span class="hljs-string">'Accident_History'</span>, <span class="hljs-string">'Service_Records'</span>]

preprocessor = ColumnTransformer([
    (<span class="hljs-string">'cat'</span>, OneHotEncoder(drop=<span class="hljs-string">'first'</span>), categorical),
    (<span class="hljs-string">'num'</span>, StandardScaler(), numerical)
])
</code></pre>
<pre><code class="lang-python">X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)
</code></pre>
<pre><code class="lang-python">categorical = [<span class="hljs-string">'Brand'</span>, <span class="hljs-string">'Fuel_Type'</span>, <span class="hljs-string">'Transmission'</span>]
numerical = [
    <span class="hljs-string">'Year'</span>, <span class="hljs-string">'Mileage'</span>, <span class="hljs-string">'Engine_Size'</span>, <span class="hljs-string">'Horsepower'</span>,
    <span class="hljs-string">'Previous_Owners'</span>, <span class="hljs-string">'Accident_History'</span>, <span class="hljs-string">'Service_Records'</span>
]

<span class="hljs-comment"># Categorical pipeline</span>
cat_pipeline = Pipeline(steps=[
    (<span class="hljs-string">'imputer'</span>, SimpleImputer(strategy=<span class="hljs-string">'most_frequent'</span>)),
    (<span class="hljs-string">'encoder'</span>, OneHotEncoder(drop=<span class="hljs-string">'first'</span>, handle_unknown=<span class="hljs-string">'ignore'</span>))
])

<span class="hljs-comment"># Numerical pipeline</span>
num_pipeline = Pipeline(steps=[
    (<span class="hljs-string">'imputer'</span>, SimpleImputer(strategy=<span class="hljs-string">'median'</span>)),
    (<span class="hljs-string">'scaler'</span>, StandardScaler())
])

<span class="hljs-comment"># Combine pipelines</span>
preprocessor = ColumnTransformer([
    (<span class="hljs-string">'cat'</span>, cat_pipeline, categorical),
    (<span class="hljs-string">'num'</span>, num_pipeline, numerical)
])
</code></pre>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

print(<span class="hljs-string">"Missing values in X_train:"</span>,
      np.isnan(X_train_processed.toarray() <span class="hljs-keyword">if</span> hasattr(X_train_processed, <span class="hljs-string">"toarray"</span>) <span class="hljs-keyword">else</span> X_train_processed).sum())

print(<span class="hljs-string">"Missing values in X_test:"</span>,
      np.isnan(X_test_processed.toarray() <span class="hljs-keyword">if</span> hasattr(X_test_processed, <span class="hljs-string">"toarray"</span>) <span class="hljs-keyword">else</span> X_test_processed).sum())
</code></pre>
<pre><code class="lang-python">print(<span class="hljs-string">"X_train shape:"</span>, X_train_processed.shape)
print(<span class="hljs-string">"X_test shape:"</span>, X_test_processed.shape)
print(<span class="hljs-string">"y_train shape:"</span>, y_train.shape)
print(<span class="hljs-string">"y_test shape:"</span>, y_test.shape)
</code></pre>
<p>All checks confirmed that the processed datasets contained no missing values and were correctly shaped.</p>
<hr />
<h2 id="heading-phase-2-model-development">Phase 2: Model Development</h2>
<p>To identify the most suitable model, multiple regression techniques were tested.</p>
<hr />
<h3 id="heading-21-baseline-model-multiple-linear-regression">2.1 Baseline Model: Multiple Linear Regression</h3>
<p>Linear Regression was chosen as the baseline model due to its simplicity and interpretability.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Baseline Model: Multiple Linear Regression</span>
lr_pipeline = Pipeline(steps=[
    (<span class="hljs-string">"preprocessor"</span>, preprocessor),
    (<span class="hljs-string">"model"</span>, LinearRegression())
])

start = time.time()
lr_pipeline.fit(X_train, y_train)
train_time_lr = time.time() - start

y_train_pred_lr = lr_pipeline.predict(X_train)
y_test_pred_lr = lr_pipeline.predict(X_test)
</code></pre>
<pre><code class="lang-python">lr_results = {
    <span class="hljs-string">"Model"</span>: <span class="hljs-string">"Linear Regression"</span>,
    <span class="hljs-string">"Train R2"</span>: r2_score(y_train, y_train_pred_lr),
    <span class="hljs-string">"Test R2"</span>: r2_score(y_test, y_test_pred_lr),
    <span class="hljs-string">"MAE"</span>: mean_absolute_error(y_test, y_test_pred_lr),
    <span class="hljs-string">"RMSE"</span>: mean_squared_error(y_test, y_test_pred_lr) ** <span class="hljs-number">0.5</span>,
    <span class="hljs-string">"Training Time"</span>: train_time_lr
}
</code></pre>
<p>Evaluation metrics included:</p>
<ul>
<li><p>R² Score</p>
</li>
<li><p>Mean Absolute Error (MAE)</p>
</li>
<li><p>Root Mean Squared Error (RMSE)</p>
</li>
<li><p>Training time</p>
</li>
</ul>
<hr />
<h3 id="heading-22-polynomial-regression">2.2 Polynomial Regression</h3>
<p>Polynomial Regression was tested with degrees 2 and 3 to capture non-linear relationships.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Polynomial Regression</span>
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> PolynomialFeatures
<span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LinearRegression
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> r2_score, mean_absolute_error, mean_squared_error
<span class="hljs-keyword">import</span> time

<span class="hljs-comment"># Store evaluation results (FOR TABLE)</span>
poly_results = []

<span class="hljs-comment"># Store trained models (FOR PLOTTING / SELECTION)</span>
poly_models = {}

<span class="hljs-keyword">for</span> degree <span class="hljs-keyword">in</span> [<span class="hljs-number">2</span>, <span class="hljs-number">3</span>]:
    poly_pipeline = Pipeline([
        (<span class="hljs-string">'preprocessor'</span>, preprocessor),
        (<span class="hljs-string">'poly'</span>, PolynomialFeatures(degree=degree, include_bias=<span class="hljs-literal">False</span>)),
        (<span class="hljs-string">'model'</span>, LinearRegression())
    ])

    start = time.time()
    poly_pipeline.fit(X_train, y_train)
    train_time = time.time() - start

    y_train_pred = poly_pipeline.predict(X_train)
    y_test_pred = poly_pipeline.predict(X_test)

    <span class="hljs-comment"># Save model</span>
    poly_models[degree] = poly_pipeline

    <span class="hljs-comment"># Save evaluation metrics</span>
    poly_results.append({
        <span class="hljs-string">"Model"</span>: <span class="hljs-string">f"Polynomial Regression (deg <span class="hljs-subst">{degree}</span>)"</span>,
        <span class="hljs-string">"Train R2"</span>: r2_score(y_train, y_train_pred),
        <span class="hljs-string">"Test R2"</span>: r2_score(y_test, y_test_pred),
        <span class="hljs-string">"MAE"</span>: mean_absolute_error(y_test, y_test_pred),
        <span class="hljs-string">"RMSE"</span>: mean_squared_error(y_test, y_test_pred) ** <span class="hljs-number">0.5</span>,
        <span class="hljs-string">"Training Time"</span>: train_time
    })
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment"># Select best polynomial degree </span>
best_poly_degree = <span class="hljs-number">2</span>
poly_pipeline = poly_models[best_poly_degree]

y_train_pred_poly = poly_pipeline.predict(X_train)
y_test_pred_poly = poly_pipeline.predict(X_test)
</code></pre>
<p>While higher degrees improved training performance, they showed reduced generalization on test data.</p>
<hr />
<h3 id="heading-23-support-vector-regression-svr">2.3 Support Vector Regression (SVR)</h3>
<p>Support Vector Regression with an RBF kernel was evaluated using different hyperparameter configurations.</p>
<pre><code class="lang-python"><span class="hljs-comment">#SVR</span>
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline
<span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVR
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> r2_score, mean_absolute_error, mean_squared_error
<span class="hljs-keyword">import</span> time

svr_results = []

svr_configs = [
    {<span class="hljs-string">'C'</span>: <span class="hljs-number">100</span>, <span class="hljs-string">'gamma'</span>: <span class="hljs-string">'auto'</span>},
    {<span class="hljs-string">'C'</span>: <span class="hljs-number">1000</span>, <span class="hljs-string">'gamma'</span>: <span class="hljs-string">'scale'</span>}
]

<span class="hljs-keyword">for</span> cfg <span class="hljs-keyword">in</span> svr_configs:
    svr_pipeline = Pipeline([
        (<span class="hljs-string">'preprocessor'</span>, preprocessor),  <span class="hljs-comment"># encoding + imputation + scaling</span>
        (<span class="hljs-string">'svr'</span>, SVR(kernel=<span class="hljs-string">'rbf'</span>, **cfg))
    ])

    start = time.time()
    svr_pipeline.fit(X_train, y_train)
    train_time = time.time() - start

    y_train_pred_svr = svr_pipeline.predict(X_train)
    y_test_pred_svr = svr_pipeline.predict(X_test)

    svr_results.append({
        <span class="hljs-string">"Model"</span>: <span class="hljs-string">f"SVR (RBF) C=<span class="hljs-subst">{cfg[<span class="hljs-string">'C'</span>]}</span> gamma=<span class="hljs-subst">{cfg[<span class="hljs-string">'gamma'</span>]}</span>"</span>,
        <span class="hljs-string">"Train R2"</span>: r2_score(y_train, y_train_pred),
        <span class="hljs-string">"Test R2"</span>: r2_score(y_test, y_test_pred),
        <span class="hljs-string">"MAE"</span>: mean_absolute_error(y_test, y_test_pred),
        <span class="hljs-string">"RMSE"</span>: mean_squared_error(y_test, y_test_pred) ** <span class="hljs-number">0.5</span>,
        <span class="hljs-string">"Training Time"</span>: train_time})
</code></pre>
<p>Although SVR performed well on training data, it showed signs of overfitting.</p>
<hr />
<h3 id="heading-24-decision-tree-regression">2.4 Decision Tree Regression</h3>
<p>Decision Trees were tested with varying depths to balance bias and variance.</p>
<pre><code class="lang-python"><span class="hljs-comment">#Decision Tree Regression</span>
<span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor

dt_results = []

<span class="hljs-keyword">for</span> depth <span class="hljs-keyword">in</span> [<span class="hljs-number">3</span>, <span class="hljs-number">5</span>, <span class="hljs-number">10</span>, <span class="hljs-literal">None</span>]:
    dt_pipeline = Pipeline([
        (<span class="hljs-string">'preprocessor'</span>, preprocessor),
        (<span class="hljs-string">'dt'</span>, DecisionTreeRegressor(
            max_depth=depth,
            random_state=<span class="hljs-number">42</span>))])

    start = time.time()
    dt_pipeline.fit(X_train, y_train)
    train_time = time.time() - start

    y_train_pred_dt = dt_pipeline.predict(X_train)
    y_test_pred_dt = dt_pipeline.predict(X_test)

    dt_results.append({
        <span class="hljs-string">"Model"</span>: <span class="hljs-string">f"Decision Tree depth=<span class="hljs-subst">{depth}</span>"</span>,
        <span class="hljs-string">"Train R2"</span>: r2_score(y_train, y_train_pred),
        <span class="hljs-string">"Test R2"</span>: r2_score(y_test, y_test_pred),
        <span class="hljs-string">"MAE"</span>: mean_absolute_error(y_test, y_test_pred),
        <span class="hljs-string">"RMSE"</span>: mean_squared_error(y_test, y_test_pred) ** <span class="hljs-number">0.5</span>,
        <span class="hljs-string">"Training Time"</span>: train_time})
</code></pre>
<p>Deeper trees achieved perfect training scores but failed to generalize well.</p>
<hr />
<h2 id="heading-phase-3-model-evaluation-amp-comparison">Phase 3: Model Evaluation &amp; Comparison</h2>
<h3 id="heading-31-model-comparison-table">3.1 Model Comparison Table</h3>
<p>All models were evaluated side-by-side using consistent metrics.</p>
<pre><code class="lang-python">results_df = pd.DataFrame(
    [lr_results] + poly_results + svr_results + dt_results
)
results_df
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769012494137/388972d5-9650-4aff-beb3-363b70fb7fb1.png" alt class="image--center mx-auto" /></p>
<p>The comparison revealed that <strong>Linear Regression achieved the best test performance</strong>, with the highest R² and lowest RMSE.</p>
<hr />
<h3 id="heading-32-predicted-vs-actual-price-visualization">3.2 Predicted vs Actual Price Visualization</h3>
<p>Predicted prices were plotted against actual prices to visually assess model accuracy.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> r2_score

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plot_predicted_vs_actual</span>(<span class="hljs-params">y_true, y_pred, model_name</span>):</span>
    errors = np.abs(y_true - y_pred)
    r2 = r2_score(y_true, y_pred)

    plt.figure(figsize=(<span class="hljs-number">7</span>, <span class="hljs-number">6</span>))
    scatter = plt.scatter(y_true, y_pred, c=errors)
    plt.plot([y_true.min(), y_true.max()],
             [y_true.min(), y_true.max()],
             linestyle=<span class="hljs-string">'--'</span>)
    plt.xlabel(<span class="hljs-string">"Actual Price"</span>)
    plt.ylabel(<span class="hljs-string">"Predicted Price"</span>)
    plt.title(<span class="hljs-string">f"<span class="hljs-subst">{model_name}</span> | R² = <span class="hljs-subst">{r2:<span class="hljs-number">.3</span>f}</span>"</span>)
    plt.colorbar(scatter, label=<span class="hljs-string">"Prediction Error"</span>)
    plt.show()

<span class="hljs-comment"># Example usage</span>
plot_predicted_vs_actual(y_test, y_test_pred, <span class="hljs-string">"Linear Regression"</span>)
plot_predicted_vs_actual( y_test, y_test_pred_poly,<span class="hljs-string">"Polynomial Regression"</span>)
plot_predicted_vs_actual(y_test, y_test_pred_svr, <span class="hljs-string">"Support Vector Regression"</span>)
plot_predicted_vs_actual(y_test, y_test_pred_dt, <span class="hljs-string">"Decision Tree Regression"</span>)
</code></pre>
<p>The closer the points are to the diagonal line, the better the model performance.</p>
<hr />
<h3 id="heading-33-residual-analysis">3.3 Residual Analysis</h3>
<p>Residual plots and histograms were used to analyze prediction errors.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">residual_analysis</span>(<span class="hljs-params">y_true, y_pred, model_name</span>):</span>
    residuals = y_true - y_pred

    <span class="hljs-comment"># Residuals vs Predicted</span>
    plt.figure(figsize=(<span class="hljs-number">7</span>, <span class="hljs-number">5</span>))
    plt.scatter(y_pred, residuals)
    plt.axhline(<span class="hljs-number">0</span>, linestyle=<span class="hljs-string">'--'</span>)
    plt.xlabel(<span class="hljs-string">"Predicted Price"</span>)
    plt.ylabel(<span class="hljs-string">"Residuals"</span>)
    plt.title(<span class="hljs-string">f"<span class="hljs-subst">{model_name}</span> Residuals vs Predicted"</span>)
    plt.show()

    <span class="hljs-comment"># Histogram of residuals</span>
    plt.figure(figsize=(<span class="hljs-number">7</span>, <span class="hljs-number">5</span>))
    plt.hist(residuals, bins=<span class="hljs-number">30</span>)
    plt.xlabel(<span class="hljs-string">"Residual"</span>)
    plt.ylabel(<span class="hljs-string">"Frequency"</span>)
    plt.title(<span class="hljs-string">f"<span class="hljs-subst">{model_name}</span> Residual Distribution"</span>)
    plt.show()

<span class="hljs-comment"># Apply to models</span>
residual_analysis(y_test, y_test_pred, <span class="hljs-string">"Linear Regression"</span>)
residual_analysis(y_test, y_test_pred_poly, <span class="hljs-string">"Polynomial Regression"</span>)
residual_analysis(y_test, y_test_pred_svr, <span class="hljs-string">"SVR"</span>)
residual_analysis(y_test, y_test_pred_dt, <span class="hljs-string">"Decision Tree"</span>)
</code></pre>
<p>The residuals for Linear Regression were randomly distributed, indicating good model assumptions.</p>
<hr />
<h2 id="heading-phase-4-model-selection-amp-business-application">Phase 4: Model Selection &amp; Business Application</h2>
<h3 id="heading-41-final-model-selection">4.1 Final Model Selection</h3>
<p>After evaluating accuracy, overfitting risk, interpretability, and computational efficiency, <strong>Linear Regression was selected as the final model</strong>.</p>
<p>It achieved:</p>
<ul>
<li><p>High test R² (≈ 0.935)</p>
</li>
<li><p>Lowest RMSE</p>
</li>
<li><p>Stable generalization</p>
</li>
<li><p>High interpretability for business users</p>
</li>
<li><p>Fast training and prediction times</p>
</li>
</ul>
<p>This balance makes it ideal for real-world deployment.</p>
<hr />
<h3 id="heading-42-price-prediction-for-new-cars">4.2 Price Prediction for New Cars</h3>
<p>The final model was used to predict prices for three hypothetical cars:</p>
<ul>
<li><p>A budget Toyota sedan</p>
</li>
<li><p>A low-mileage BMW luxury sedan</p>
</li>
<li><p>An older Ford with accident history</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> OneHotEncoder
<span class="hljs-keyword">from</span> sklearn.impute <span class="hljs-keyword">import</span> SimpleImputer
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline

num_cols = [<span class="hljs-string">"Year"</span>, <span class="hljs-string">"Mileage"</span>, <span class="hljs-string">"Engine_Size"</span>, <span class="hljs-string">"Horsepower"</span>, <span class="hljs-string">"Previous_Owners"</span>]
cat_cols = [<span class="hljs-string">"Brand"</span>, <span class="hljs-string">"Fuel_Type"</span>, <span class="hljs-string">"Transmission"</span>, <span class="hljs-string">"Accident_History"</span>, <span class="hljs-string">"Service_Records"</span>]

numeric_tf = Pipeline([
    (<span class="hljs-string">"imputer"</span>, SimpleImputer(strategy=<span class="hljs-string">"median"</span>))
])

categorical_tf = Pipeline([
    (<span class="hljs-string">"imputer"</span>, SimpleImputer(strategy=<span class="hljs-string">"most_frequent"</span>)),
    (<span class="hljs-string">"encoder"</span>, OneHotEncoder(handle_unknown=<span class="hljs-string">"ignore"</span>))
])

preprocessor = ColumnTransformer([
    (<span class="hljs-string">"num"</span>, numeric_tf, num_cols),
    (<span class="hljs-string">"cat"</span>, categorical_tf, cat_cols)
])
</code></pre>
<h4 id="heading-prediction-summary">Prediction Summary</h4>
<ul>
<li><p>Toyota (2015): ≈ <strong>$29,107</strong></p>
</li>
<li><p>BMW (2020): ≈ <strong>$65,855</strong></p>
</li>
<li><p>Ford (2012): ≈ <strong>$9,980</strong></p>
</li>
</ul>
<p>These results align well with market expectations.</p>
<hr />
<h2 id="heading-business-insights-amp-recommendations">Business Insights &amp; Recommendations</h2>
<h3 id="heading-key-findings">Key Findings</h3>
<p>Brand, mileage, vehicle age, accident history, and engine performance are the strongest drivers of car prices. Premium brands retain value better, while high mileage and accident history significantly reduce resale value.</p>
<h3 id="heading-business-recommendations">Business Recommendations</h3>
<p>Dealers should prioritize low-mileage, accident-free vehicles with complete service records. Pricing strategies should leverage model predictions to identify undervalued listings and apply data-driven price adjustments. Customers should be educated on how mileage and maintenance history affect long-term value.</p>
<h3 id="heading-model-limitations">Model Limitations</h3>
<p>The linear regression model assumes linear relationships and does not capture complex interactions. It may struggle with rare brands, extreme market conditions, or heavily modified vehicles.</p>
<h3 id="heading-future-improvements">Future Improvements</h3>
<p>Future work could involve advanced models such as Gradient Boosting or Random Forest, incorporation of real transaction prices, location data, and periodic retraining to maintain accuracy.</p>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>This project demonstrates how machine learning can be applied to solve a real business problem in the automotive industry. By combining solid data preprocessing, careful model evaluation, and business-focused interpretation, the resulting pricing system provides both technical reliability and practical value.</p>
]]></content:encoded></item><item><title><![CDATA[Data and Decisions: Building a Housing Price Prediction Model with Multiple Linear Regression]]></title><description><![CDATA[Introduction
Property valuation is one of the most critical tasks in real estate. Traditionally, this process relies heavily on human judgment and market intuition. While experience matters, data offers an opportunity to make pricing more consistent,...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/data-and-decisions-building-a-housing-price-prediction-model-with-multiple-linear-regression</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/data-and-decisions-building-a-housing-price-prediction-model-with-multiple-linear-regression</guid><category><![CDATA[Data Preprocessing]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Thu, 08 Jan 2026 13:58:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767880127214/5a7d2d03-cb64-4aca-ab1a-24aa99ca6b9d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-introduction">Introduction</h2>
<p>Property valuation is one of the most critical tasks in real estate. Traditionally, this process relies heavily on human judgment and market intuition. While experience matters, data offers an opportunity to make pricing more consistent, transparent, and defensible.</p>
<p>In this project, I worked as a data scientist in a real estate company tasked with developing a machine learning model to predict house prices based on property characteristics. The goal was not just to predict prices accurately, but also to understand what truly drives housing prices and translate those insights into real business value.</p>
<p>This assessment was structured into four phases, moving from raw data understanding to actionable business recommendations.</p>
<hr />
<h3 id="heading-objective">Objective</h3>
<p>The objective of this project was to apply core machine learning concepts end-to-end by building a multiple linear regression model that predicts house prices and then optimizing it through feature selection.</p>
<hr />
<h2 id="heading-dataset">Dataset</h2>
<p><strong>File:</strong> <code>Assessment-Dataset/housing_price_data.csv</code><br />The dataset contains information about house size, location, amenities, accessibility, and pricing.</p>
<hr />
<h2 id="heading-phase-1-data-understanding-and-preprocessing">Phase 1: Data Understanding and Preprocessing</h2>
<p>The project began with a thorough understanding of the dataset.</p>
<h3 id="heading-exploratory-data-analysis-eda">Exploratory Data Analysis (EDA)</h3>
<p>I first examined the dataset’s structure, size, and feature types to understand what kind of data I was working with. Statistical summaries were generated to observe the range, mean, and variability of numerical features. The distribution of the target variable (house price) was analyzed to check for skewness and outliers.</p>
<p>To understand relationships between variables, a correlation heatmap was created. This helped reveal which features had strong linear relationships with house prices and which ones contributed little signal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767877781509/76204b55-4b2c-4957-991f-d9264e8018d7.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1.0</em> Target Variable Distribution</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878008857/1ba29e3e-ae92-49a3-9f5f-52411fa03200.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1.1</em> Correlation heatmap</p>
<hr />
<h3 id="heading-data-quality-assessment">Data Quality Assessment</h3>
<p>Next, I checked for missing values and inconsistencies. Any missing entries were handled appropriately to prevent bias. Potential outliers were reviewed to ensure they represented valid market cases rather than data errors. All observations and decisions taken during this stage were documented to maintain transparency.</p>
<pre><code class="lang-python">df = pd.read_csv(<span class="hljs-string">"/kaggle/input/data-preprocessing-week-14/housing_price_data.csv"</span>)
df.head()
</code></pre>
<pre><code class="lang-python">df.shape
</code></pre>
<pre><code class="lang-python">df.info()
</code></pre>
<pre><code class="lang-python">df.describe()
</code></pre>
<h3 id="heading-preprocessing-pipeline">Preprocessing Pipeline</h3>
<p>Categorical variables such as Neighborhood, Garage, and Pool were encoded using dummy variables. To avoid the dummy variable trap, one category from each encoded variable was dropped.</p>
<p>The dataset was then split into 70% training data and 30% test data. Feature scaling was applied where necessary to ensure fair contribution of numerical variables to the regression model.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Preprocessing Pipeline (Reusable)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">preprocess_data</span>(<span class="hljs-params">df</span>):</span>
    df_encoded = pd.get_dummies(df,columns=[<span class="hljs-string">"Neighborhood"</span>, <span class="hljs-string">"Garage"</span>, <span class="hljs-string">"Pool"</span>],
        drop_first=<span class="hljs-literal">True</span>) 
    X = df_encoded.drop(<span class="hljs-string">"House_Price"</span>, axis=<span class="hljs-number">1</span>)
    y = df_encoded[<span class="hljs-string">"House_Price"</span>]

    <span class="hljs-keyword">return</span> X, y

X, y = preprocess_data(df)
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment"># Train-Test Split (70/30)</span>
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.3</span>, random_state=<span class="hljs-number">42</span>)
</code></pre>
<hr />
<h2 id="heading-phase-2-model-development">Phase 2: Model Development</h2>
<h2 id="heading-two-regression-models-were-developed-and-compared">Two regression models were developed and compared.</h2>
<h3 id="heading-model-1-multiple-linear-regression-all-features">Model 1: Multiple Linear Regression (All Features)</h3>
<p>The first model included all available features. It served as a baseline to understand overall performance before optimization. The model was trained on the training set and evaluated on the test set using standard regression metrics.</p>
<hr />
<h3 id="heading-model-2-optimized-multiple-linear-regression">Model 2: Optimized Multiple Linear Regression</h3>
<p>To improve interpretability and reduce noise, I applied backward elimination with a significance level of 0.05. Using statistical p-values, features that did not contribute meaningfully to the model were removed iteratively.</p>
<p>Each elimination step was documented and justified based on statistical evidence.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Feature Selection: Backward Elimination</span>
X_sm = sm.add_constant(X.astype(float))
y_sm = y.astype(float)

X_opt = X_sm.copy()

<span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
    model = sm.OLS(y_sm, X_opt).fit()
    p_vals = model.pvalues
    max_p = p_vals.max()

    <span class="hljs-keyword">if</span> max_p &gt; <span class="hljs-number">0.05</span>:
        feature = p_vals.idxmax()
        X_opt = X_opt.drop(columns=[feature])
        print(<span class="hljs-string">f"Removed: <span class="hljs-subst">{feature}</span> (p = <span class="hljs-subst">{max_p:<span class="hljs-number">.4</span>f}</span>)"</span>)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">break</span>
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment">#Model summary</span>
model.summary()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767879451776/78f46136-ce59-4d2f-91ad-e16ea1cc2c01.png" alt class="image--center mx-auto" /></p>
<p>Notes:<br />[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.<br />[2] The condition number is large, 9.33e+04. This might indicate that there are<br />strong multicollinearity or other numerical problems.</p>
<h3 id="heading-evaluation-metrics-used">Evaluation Metrics Used</h3>
<p>Both models were evaluated using:</p>
<ul>
<li><p>R² score</p>
</li>
<li><p>Adjusted R² score</p>
</li>
<li><p>Mean Absolute Error (MAE)</p>
</li>
<li><p>Mean Squared Error (MSE)</p>
</li>
<li><p>Root Mean Squared Error (RMSE)</p>
</li>
</ul>
<hr />
<h2 id="heading-phase-3-model-evaluation-and-validation">Phase 3: Model Evaluation and Validation</h2>
<p>First A model comparison table was created to clearly show performance differences between the initial and optimized models. The optimized model achieved better interpretability while maintaining strong predictive performance, with no clear signs of overfitting or underfitting.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Model Comparison Table</span>
comparison = pd.DataFrame({ <span class="hljs-string">"Metric"</span>: [<span class="hljs-string">"R²"</span>, <span class="hljs-string">"Adjusted R²"</span>, <span class="hljs-string">"MAE"</span>, <span class="hljs-string">"MSE"</span>, <span class="hljs-string">"RMSE"</span>], <span class="hljs-string">"Full Model"</span>: metrics_full, <span class="hljs-string">"Optimized Model"</span>: metrics_opt})

comparison
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878585308/6f3b5541-42c1-46de-9d01-7731d6fa09ec.png" alt class="image--center mx-auto" /></p>
<p><em>Figure1.3</em> Model Comparison Table</p>
<p>Visual analysis played a key role in validating model behavior.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878654774/5dd401df-7d1c-48cc-b9bc-bb5f063a8700.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878680768/0d12013b-4fd3-4a02-bbb3-484620e43872.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1.4</em> Predicted vs Actual price scatter plots (both models)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878761417/98a49adb-a7e3-434f-83a8-851539b71346.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1.5</em> Residual plots</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878832180/2a921636-20ee-4b14-a602-e261cd1aa39b.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1.6</em> Feature importance bar chart</p>
<hr />
<h2 id="heading-phase-4-business-insights-and-recommendations">Phase 4: Business Insights and Recommendations</h2>
<h2 id="heading-model-interpretation">Model Interpretation</h2>
<p>The optimized regression model indicates that house prices are largely driven by property size, neighborhood quality, and accessibility. House area emerged as the most influential variable, showing a strong positive relationship with price. This confirms a familiar market truth: larger homes consistently command higher prices.</p>
<p>Neighborhood quality, especially properties located in high-end or luxury areas, also had a significant positive influence. Buyers are clearly willing to pay a premium for better infrastructure, security, and social amenities.</p>
<p>Distance from the city center showed a negative effect on house prices. Homes located farther away tend to be less valuable due to reduced accessibility to economic and social opportunities. Property tax displayed a positive association with price, acting as a proxy for overall property value and municipal service quality. Bathrooms contributed moderately to price increases by improving comfort and functionality.</p>
<p>A notable and somewhat surprising finding was that the number of bedrooms had a weak impact once total house area was considered. This suggests buyers care more about usable space than room count. House age also showed minimal influence, implying that maintenance and location matter more than construction year. Amenities such as garages and pools add value, but not as strongly as size and location.</p>
<hr />
<h2 id="heading-business-applications-and-recommendations">Business Applications and Recommendations</h2>
<p>The real estate company can use this model as a decision-support tool for pricing properties more objectively. Agents can justify listing prices using data rather than intuition alone, increasing client trust. The model can also help identify undervalued properties with strong fundamentals, opening up profitable investment opportunities.</p>
<p>However, the model has limitations. It assumes linear relationships, which may oversimplify real housing markets. It does not account for macroeconomic factors such as interest rates, inflation, or income levels. Additionally, the lack of time-series data limits trend analysis.</p>
<p>Future improvements could include incorporating economic indicators, testing non-linear models like Random Forest or Gradient Boosting, and retraining the model periodically with updated data.</p>
<hr />
<h2 id="heading-sample-predictions-and-explanations">Sample Predictions and Explanations</h2>
<p>To demonstrate real-world usage, three hypothetical houses were evaluated:</p>
<ul>
<li><p><strong>House 1:</strong> A small house far from the city center in a standard neighborhood. The model predicts a relatively low price due to limited size, average location quality, and reduced accessibility.</p>
</li>
<li><p><strong>House 2:</strong> A medium-sized house in a luxury neighborhood at a moderate distance from the city center. The predicted price is higher than House 1, mainly driven by neighborhood quality.</p>
</li>
<li><p><strong>House 3:</strong> A large house in a luxury neighborhood close to the city center. This house receives the highest predicted price because it combines all major value-driving features.</p>
</li>
</ul>
<p>These predictions highlight how the model balances different features to arrive at realistic price estimates.</p>
<hr />
<p>This project developed a housing price prediction model using multiple linear regression and feature optimization. The final model provides strong explanatory power and actionable insights. Property size, neighborhood quality, and proximity to the city center emerged as the most important pricing factors, making the model valuable for real estate valuation and strategic decision-making.</p>
<hr />
<h2 id="heading-appendix">Appendix</h2>
<ul>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878008857/1ba29e3e-ae92-49a3-9f5f-52411fa03200.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878654774/5dd401df-7d1c-48cc-b9bc-bb5f063a8700.png" alt class="image--center mx-auto" /></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878680768/0d12013b-4fd3-4a02-bbb3-484620e43872.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878761417/98a49adb-a7e3-434f-83a8-851539b71346.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767878832180/2a921636-20ee-4b14-a602-e261cd1aa39b.png" alt class="image--center mx-auto" /></p>
</li>
</ul>
<p>These visualizations support the conclusions drawn and reinforce confidence in the model.</p>
<hr />
<h3 id="heading-useful-resources-in-this-article">Useful Resources in This Article</h3>
<ul>
<li><p>Real-world housing price dataset for regression analysis</p>
</li>
<li><p>Step-by-step EDA with distributions and correlation heatmaps</p>
</li>
<li><p>Data preprocessing workflow (encoding, scaling, train–test split)</p>
</li>
<li><p>Reusable Python functions for preprocessing, training, and evaluation</p>
</li>
<li><p>Multiple Linear Regression and optimized model using backward elimination</p>
</li>
<li><p>Clear explanation of evaluation metrics (R², MAE, MSE, RMSE)</p>
</li>
<li><p>Model diagnostics: predicted vs actual plots, residuals, feature importance</p>
</li>
<li><p>Business insights translated from model results</p>
<hr />
<p>  <em>cover photo credit</em>: Pinterest(pngtree)</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[World Population Analysis]]></title><description><![CDATA[Introduction
Population plays a central role in shaping economies, infrastructure, healthcare systems, education planning, and long-term development strategies. While global population figures are often discussed as a single headline number, the unde...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/world-population-analysis</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/world-population-analysis</guid><category><![CDATA[web scraping]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[analysis]]></category><category><![CDATA[visualization]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Thu, 18 Dec 2025 08:51:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766047679072/b7e0eb6d-9c9f-4d0f-9537-0c3c4256f76b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Population plays a central role in shaping economies, infrastructure, healthcare systems, education planning, and long-term development strategies. While global population figures are often discussed as a single headline number, the underlying distribution across countries tells a much more detailed story.</p>
<p>Some countries account for a significant share of the world’s population, while many others contribute relatively small proportions. Understanding this imbalance is essential for effective planning, policy formulation, and sustainable development.</p>
<p>In this article, I explore global population distribution using publicly available data scraped from Wikipedia. Through data cleaning, analysis, and visualization, the goal is to uncover patterns in population concentration, examine how population declines by country rank, and highlight what these trends imply for stakeholders.</p>
<hr />
<h2 id="heading-data-source-and-setup">Data Source and Setup</h2>
<p>The dataset was obtained from Wikipedia’s world population tables using web scraping techniques in Python.</p>
<h3 id="heading-data-source">Data Source</h3>
<ul>
<li><p>Wikipedia: World population by country URL: <a target="_blank" href="https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population">World population by country</a></p>
</li>
<li><p>Data type: Country-level population estimates</p>
</li>
<li><p>Coverage: Over 200 countries and territories</p>
</li>
</ul>
<h3 id="heading-tools-used">Tools Used</h3>
<ul>
<li><p>Python</p>
</li>
<li><p>Requests (for fetching HTML content)</p>
</li>
<li><p>Pandas (for data manipulation)</p>
</li>
<li><p>Matplotlib (for visualization)</p>
</li>
</ul>
<hr />
<h2 id="heading-web-scraping-of-data"><strong>Web scraping of Data</strong></h2>
<pre><code class="lang-python"><span class="hljs-comment"># requests + pandas.read_html used to scrape population tables</span>
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt

url = <span class="hljs-string">"https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population"</span>

headers = {<span class="hljs-string">"User-Agent"</span>: <span class="hljs-string">"Mozilla/5.0 (Windows NT 10.0; Win64; x64)"</span>}

response = requests.get(url, headers=headers, timeout=<span class="hljs-number">30</span>)
response.raise_for_status()

tables = pd.read_html(response.text)
population_df = tables[<span class="hljs-number">0</span>]   <span class="hljs-comment"># main population table</span>

population_df.head()
</code></pre>
<hr />
<h2 id="heading-data-cleaning-and-preparation">Data Cleaning and Preparation</h2>
<p>To ensure accurate analysis, the following steps were applied:</p>
<ul>
<li><p>Removed the global aggregate entry (“World”) from country-level analysis to avoid double counting.</p>
</li>
<li><p>Ranked countries by population size.</p>
</li>
<li><p>Created cumulative population metrics for further analysis.</p>
</li>
</ul>
<p>These steps ensured the dataset was consistent and suitable for statistical exploration.</p>
<pre><code class="lang-python"><span class="hljs-comment"># keeping relevant column </span>
population_df = population_df[[ <span class="hljs-string">"Location"</span>, <span class="hljs-string">"Population"</span>, <span class="hljs-string">"% of world"</span>,<span class="hljs-string">"Date"</span>]]

population_df  = population_df[population_df[<span class="hljs-string">"Location"</span>] != <span class="hljs-string">"World"</span>]
</code></pre>
<pre><code class="lang-python">population_df = population_df.sort_values(by=<span class="hljs-string">"Population"</span>, ascending=<span class="hljs-literal">False</span>)

population_df.head()
population_df.tail()
</code></pre>
<hr />
<h2 id="heading-population-share-analysis">Population Share Analysis</h2>
<p>A population share visualization was created to show how the world’s population is distributed among the most populous countries.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766045889514/1f8fd302-040f-4aaa-9705-564fd7bc17d2.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 1: Pie chart-Population Share of Top 10 countries</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766045819409/dabe96da-2d14-41cd-a110-32cb16c987f4.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 2: Bar Chart -Top 10 most populous countries</em></p>
<h3 id="heading-insight">Insight</h3>
<p>The visualization shows that a small number of countries account for a very large share of the global population. India and China alone represent a substantial proportion, while the remaining countries collectively make up the rest.</p>
<p>This highlights how population is far from evenly distributed across the globe.</p>
<hr />
<h2 id="heading-cumulative-population-contribution">Cumulative Population Contribution</h2>
<p>To understand how population accumulates as more countries are considered, a cumulative population curve was plotted based on country rank.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766046112267/ed2523cf-f1b6-41f1-9cbc-fc0f3b713724.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 3 : Line plot- Cumulative Global Population contribution</em></p>
<h3 id="heading-insight-1">Insight</h3>
<p>The curve rises sharply at the beginning and gradually flattens. This means that the top-ranked countries contribute most of the global population, while additional countries add smaller increments.</p>
<p>In practical terms, a relatively small group of countries determines most global population outcomes.</p>
<hr />
<h2 id="heading-population-decay-by-country-rank">Population Decay by Country Rank</h2>
<p>Population size was plotted against country rank to observe how quickly population declines as rank increases.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766046591834/03367d75-0a04-4fe4-8edf-c95aec371374.png" alt class="image--center mx-auto" /></p>
<p><em>Figure 4 : Scatter Plot - Population Decay by Country Rank</em></p>
<h3 id="heading-insight-2">Insight</h3>
<p>The pattern shows a steep decline in population after the first few ranks, followed by a long tail of countries with much smaller populations. This type of distribution is common in large-scale systems and reflects structural imbalance rather than random variation.</p>
<hr />
<h2 id="heading-key-findings">Key Findings</h2>
<p>From the analysis, several important points emerge:</p>
<ul>
<li><p>Global population is highly concentrated in a small number of countries.</p>
</li>
<li><p>Most countries contribute only a small fraction to the total population.</p>
</li>
<li><p>Population distribution follows a consistent decay pattern by rank.</p>
</li>
<li><p>These trends are likely to persist without major structural changes.</p>
</li>
</ul>
<hr />
<h2 id="heading-stakeholder-implications">Stakeholder Implications</h2>
<h3 id="heading-policymakers">Policymakers</h3>
<p>Population-heavy countries require focused attention in infrastructure development, healthcare provision, and education planning. Broad policies that ignore population concentration risk inefficiency.</p>
<h3 id="heading-development-organizations">Development Organizations</h3>
<p>Targeting high-population regions can improve the impact of development initiatives, while smaller countries benefit from tailored approaches.</p>
<h3 id="heading-urban-and-regional-planners">Urban and Regional Planners</h3>
<p>Population concentration increases pressure on cities and surrounding regions. Long-term planning must account for projected population trends.</p>
<h3 id="heading-economists-and-investors">Economists and Investors</h3>
<p>Population size influences market potential, labor availability, and future growth. Demographic patterns provide valuable context for economic decision-making.</p>
<hr />
<h2 id="heading-limitations">Limitations</h2>
<ul>
<li><p>Population figures are estimates and may change over time.</p>
</li>
<li><p>The analysis does not account for migration shocks, environmental factors, or sudden demographic changes.</p>
</li>
<li><p>Country-level data masks important regional and urban differences.</p>
</li>
</ul>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>This analysis provides a clear view of how the world’s population is distributed and why that distribution matters. Rather than being evenly spread, global population is concentrated in a small number of countries, shaping economic, social, and developmental outcomes.</p>
<p>Understanding these patterns helps stakeholders make informed decisions and plan more effectively for the future. Population data, when examined closely, offers valuable insights into how societies are structured and how they may evolve.</p>
<hr />
<h2 id="heading-authors-reflection">Author’s Reflection</h2>
<p>Working through this project reinforced the importance of looking beyond headline numbers. Seeing how quickly population declines after the most populous countries provided a deeper understanding of global imbalance.</p>
<p>This analysis showed that data is more than a collection of figures. It is a way to understand systems, challenge assumptions, and support better decision-making. Population trends, in particular, offer a powerful lens through which to view global development.</p>
<hr />
<h2 id="heading-useful-resources">Useful Resources</h2>
<p>In this article, readers will find:</p>
<ul>
<li><p>A web-scraped global population dataset</p>
</li>
<li><p>Clear data cleaning and preparation steps</p>
</li>
<li><p>Multiple visualizations beyond simple bar charts</p>
</li>
<li><p>Insights into population concentration and inequality</p>
</li>
<li><p>Stakeholder-focused interpretation of results</p>
</li>
</ul>
<hr />
<p><em>Thank you for reading. If you found this analysis helpful or have questions, feel free to share your thoughts.</em></p>
<p><em>cover photo credit : Freepik.com</em></p>
]]></content:encoded></item><item><title><![CDATA[A Look at Nigeria's Mobile Phone Trade: Imports and Exports in 2023]]></title><description><![CDATA[Introduction
Every modern economy depends on the steady movement of goods across its borders. In today’s connected world, mobile phones are more than luxury items; they are essential tools for communication, business, education, and access to digital...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/a-look-at-nigerias-mobile-phone-trade-</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/a-look-at-nigerias-mobile-phone-trade-</guid><category><![CDATA[#DataAnalysis #NigeriaEconomy #UNComtrade #MobilePhones #InternationalTrade #DataVisualization #AfricaTech]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Tue, 18 Nov 2025 22:05:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763503427186/2f5d961d-d9f8-46b0-b348-80d367bdb903.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-introduction">Introduction</h2>
<p>Every modern economy depends on the steady movement of goods across its borders. In today’s connected world, mobile phones are more than luxury items; they are essential tools for communication, business, education, and access to digital services. Understanding how a country trades mobile phones can reveal deeper insights into its technological landscape, market dependency, and global partnerships.</p>
<p>Using 2023 data from the United Nations Comtrade database, this project examines Nigeria’s trade activity in mobile phones, focusing on the product category HS Code 8517: telephone sets, including telephones for cellular networks or other wireless networks.</p>
<p>The analysis aims to answer key questions.<br />How much does Nigeria import and export? Who are the major trading partners? Which countries trade consistently? Are there bidirectional trading relationships? What monthly patterns exist? And how do Nigeria’s biggest import sources compare to global totals?</p>
<p>This study provides a comprehensive overview of Nigeria’s mobile phone trade ecosystem for the year 2023.</p>
<h2 id="heading-data-source-and-setup">Data Source and Setup</h2>
<p>All data was retrieved from the UN Comtrade database with the following parameters.</p>
<p>Type of Data: Goods<br />Reporter: Nigeria<br />Period: All months in 2023<br />Partners: All available<br />Trade Flow: Imports and Exports<br />Commodity: HS Code 8517<br />Frequency: Monthly</p>
<p>After downloading the dataset, it was loaded with:</p>
<pre><code class="lang-python">df = pd.read_csv(<span class="hljs-string">"Nigeria_MobilePhones_2023.csv"</span>, encoding=<span class="hljs-string">"latin1"</span>)
df.head()
</code></pre>
<p>The dataset was then divided into import and export subsets:</p>
<pre><code class="lang-python">imports = MobilePhone_countries[MobilePhone_countries[<span class="hljs-string">"Trade Flow"</span>] == <span class="hljs-string">"Import"</span>]
exports = MobilePhone_countries[MobilePhone_countries[<span class="hljs-string">"Trade Flow"</span>] == <span class="hljs-string">"Export"</span>]
</code></pre>
<p>This separation made it easier to explore each component of Nigeria’s trade individually.</p>
<h2 id="heading-total-imports-and-total-exports">Total Imports and Total Exports</h2>
<h3 id="heading-total-imports-in-2023">Total Imports in 2023</h3>
<p>Total imports were calculated using:</p>
<pre><code class="lang-python">total_imports_by_year = imports.groupby(<span class="hljs-string">"Year"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum()
</code></pre>
<p>Result:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Year</td><td>Total Imports (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>2023</td><td>655,014,300</td></tr>
</tbody>
</table>
</div><h3 id="heading-total-exports-in-2023">Total Exports in 2023</h3>
<p>Total exports were calculated using:</p>
<pre><code class="lang-python">total_exports_year = exports.groupby(<span class="hljs-string">"Year"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum()
</code></pre>
<p>Result:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Year</td><td>Total Exports (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>2023</td><td>2,104.21</td></tr>
</tbody>
</table>
</div><h3 id="heading-trade-balance">Trade Balance</h3>
<p>The trade balance is simply:</p>
<p>Balance = Exports − Imports<br />Balance ≈ 2,104 − 655,014,300<br />Balance ≈ −655 million USD</p>
<p>Nigeria’s mobile phone trade in 2023 shows a very large deficit. Imports dominate overwhelmingly, with practically no export activity.</p>
<h2 id="heading-main-trade-partners">Main Trade Partners</h2>
<h3 id="heading-top-import-partners">Top Import Partners</h3>
<p>These are the countries Nigeria buys the most mobile phones from:</p>
<pre><code class="lang-python">imports_by_country = imports.groupby(<span class="hljs-string">"Partner"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum()
top_country = imports_by_country.sort_values(ascending=<span class="hljs-literal">False</span>).head(<span class="hljs-number">5</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Rank</td><td>Partner</td><td>Import Value (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>China</td><td>496,363,200</td></tr>
<tr>
<td>2</td><td>Sweden</td><td>56,484,860</td></tr>
<tr>
<td>3</td><td>Mexico</td><td>19,519,240</td></tr>
<tr>
<td>4</td><td>USA</td><td>19,346,690</td></tr>
<tr>
<td>5</td><td>China, Hong Kong SAR</td><td>10,074,570</td></tr>
</tbody>
</table>
</div><p>China is the dominant participant in Nigeria’s mobile phone market. By itself, it accounts for the vast majority of all imports.</p>
<h3 id="heading-top-export-partner">Top Export Partner</h3>
<p>Nigeria’s exports are minimal and flow to only one country:</p>
<pre><code class="lang-python">exports_by_country = exports.groupby(<span class="hljs-string">"Partner"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum()
top_export_country = exports_by_country.sort_values(ascending=<span class="hljs-literal">False</span>).head(<span class="hljs-number">5</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Rank</td><td>Partner</td><td>Export Value (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>Zimbabwe</td><td>2,104.21</td></tr>
</tbody>
</table>
</div><p>Mobile phone exports from Nigeria are almost nonexistent.</p>
<h2 id="heading-regular-customers">Regular Customers</h2>
<p>Regular customers are partners that appear in the trade records every month of the year.</p>
<p>The calculation:</p>
<pre><code class="lang-python">regular_customers = df.groupby(<span class="hljs-string">"Partner"</span>)[<span class="hljs-string">"Month"</span>].nunique()
regular_customers = regular_customers[regular_customers == <span class="hljs-number">12</span>]
</code></pre>
<p>Result:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Partner</td></tr>
</thead>
<tbody>
<tr>
<td>China</td></tr>
<tr>
<td>World</td></tr>
</tbody>
</table>
</div><p>China remains the only consistent trading partner across all twelve months. This highlights how deeply Nigeria’s mobile phone market depends on Chinese imports.</p>
<h2 id="heading-countries-with-both-imports-and-exports">Countries with Both Imports and Exports</h2>
<p>To check if Nigeria had any two-way trade partners:</p>
<pre><code class="lang-python">import_countries = set(imports[<span class="hljs-string">"Partner"</span>])
export_countries = set(exports[<span class="hljs-string">"Partner"</span>])
bidirectional = import_countries &amp; export_countries
</code></pre>
<p>Result:</p>
<pre><code class="lang-python">set()
</code></pre>
<p>Nigeria has no country from which it both imports and exports mobile phones in 2023.</p>
<h2 id="heading-monthly-import-and-export-trends">Monthly Import and Export Trends</h2>
<h3 id="heading-monthly-import-totals">Monthly Import Totals</h3>
<pre><code class="lang-python">monthly_imports = imports.groupby(<span class="hljs-string">"Month"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum().sort_values(ascending=<span class="hljs-literal">False</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Month</td><td>Import Value (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>12</td><td>93,369,930</td></tr>
<tr>
<td>6</td><td>81,661,610</td></tr>
<tr>
<td>8</td><td>80,932,270</td></tr>
<tr>
<td>11</td><td>62,044,820</td></tr>
<tr>
<td>5</td><td>53,551,740</td></tr>
<tr>
<td>10</td><td>53,031,720</td></tr>
<tr>
<td>7</td><td>51,138,010</td></tr>
<tr>
<td>9</td><td>49,726,210</td></tr>
<tr>
<td>2</td><td>41,802,410</td></tr>
<tr>
<td>3</td><td>40,219,080</td></tr>
<tr>
<td>4</td><td>25,027,430</td></tr>
<tr>
<td>1</td><td>22,509,030</td></tr>
</tbody>
</table>
</div><p>Imports rise toward December. The pattern suggests a seasonal cycle tied to end-of-year consumer demand, promotions, and increased market activity.</p>
<h3 id="heading-monthly-exports">Monthly Exports</h3>
<p>Exports appear only in the month of August:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Month</td><td>Export Value (US$)</td></tr>
</thead>
<tbody>
<tr>
<td>August</td><td>2,104.21</td></tr>
</tbody>
</table>
</div><p>Exports very low as Nigeria exported to just one country</p>
<h2 id="heading-top-three-importers-compared-to-total-world-trade">Top Three Importers Compared to Total World Trade</h2>
<pre><code class="lang-python">total_world = imports[<span class="hljs-string">"Trade Value (US$)"</span>].sum()

top3 = imports.groupby(<span class="hljs-string">"Partner"</span>)[<span class="hljs-string">"Trade Value (US$)"</span>].sum() \
              .sort_values(ascending=<span class="hljs-literal">False</span>).head(<span class="hljs-number">3</span>)
</code></pre>
<p>A bar chart created with:</p>
<pre><code class="lang-python">plt.bar(top3.index, top3.values)
plt.title(<span class="hljs-string">"Top 3 Countries Exporting Mobile Phones to Nigeria (2023)"</span>)
plt.ylabel(<span class="hljs-string">"Trade Value (US$)"</span>)
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763502516961/020c2f58-4418-4159-8f1b-1b6931b5137d.png" alt class="image--center mx-auto" /></p>
<p>This visual emphasizes how far ahead China is compared to other suppliers. The gap is extremely wide, reflecting China’s dominance in Nigeria’s mobile phone supply chain.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This analysis provides a clear picture of Nigeria’s mobile phone trade landscape in 2023. The evidence shows:</p>
<p>Nigeria is primarily an importer, bringing in over 655 million dollars in mobile phones while exporting just slightly above two thousand dollars.<br />China plays an outsized role, supplying most of the devices sold in Nigeria and appearing in trade records every single month.<br />There are no countries with both import and export interactions, highlighting a one-directional trade pattern.<br />Imports show a noticeable seasonal trend with peaks toward the end of the year.<br />Exports are extremely rare, concentrated in only one month and one partner.</p>
<p>Nigeria’s mobile phone trade structure is heavily imbalanced. The country depends almost entirely on external suppliers, particularly China, for mobile communication technology. Understanding this pattern is valuable for policymakers, investors, technology analysts, and anyone interested in the structure of Nigeria’s digital economy. Expanded datasets or multi-year comparisons would add even deeper insights and help reveal long-term trends.</p>
<hr />
<h2 id="heading-authors-reflection">Author’s Reflection</h2>
<p>Working with this dataset forced me to confront a reality that many of us in Nigeria already feel, even if we do not often quantify it. Our economy is heavily tilted toward consumption. We buy, import, distribute, and retail, but when it comes to producing the technology we use every day, the numbers reveal just how little activity exists on the domestic side.</p>
<p>Seeing more than six hundred and fifty million dollars flowing outward in phone imports while only two thousand dollars came in as exports was not just surprising, it was unsettling. It showed how disconnected our consumption appetite is from our production capacity. It also raised a deeper question. How sustainable is an economy where almost everything we depend on is manufactured elsewhere?</p>
<p>Exploring the data made the issue even clearer. China’s presence throughout the entire year illustrates how dependent Nigeria is on a single global player for essential technology. The absence of any partner that both buys from and sells to Nigeria shows the one-directional nature of our participation in the global mobile phone market. We import. We consume. And then the cycle repeats.</p>
<p>This reflection is not about criticizing Nigeria for what it lacks, but about acknowledging the opportunity that lies within these numbers. A country of more than two hundred million people with one of the largest youth populations in the world should not remain only a marketplace for other nations' innovations. The demand already exists. The market size is undeniable. What is missing is the foundation for local production: infrastructure, incentives, research, manufacturing ecosystems, and long-term investment in technology.</p>
<p>This analysis reminded me that data is not just a collection of values. It is a mirror that reflects how a society functions. By seeing the gap clearly, we gain a better understanding of what must change if Nigeria is to move from being a consumer-driven economy to one that creates, builds, and exports value to the world.</p>
<hr />
<h2 id="heading-useful-resources">Useful Resources</h2>
<p>In this analysis, readers should expect a clear and data-driven exploration of Nigeria’s mobile phone trade ecosystem using official monthly records from the United Nations Comtrade database. The goal is to provide both technical insight and practical understanding. Specifically, the analysis covers the following:</p>
<ol>
<li><p>A breakdown of Nigeria’s total mobile phone imports and exports for 2023, showing the scale of trade activity.</p>
</li>
<li><p>Identification of Nigeria’s largest trading partners and how much each country contributes to overall imports and exports.</p>
</li>
<li><p>Discovery of consistent monthly trading partners to reveal long-term patterns.</p>
</li>
<li><p>Examination of whether Nigeria has any partners with two-way phone trade involving both imports and exports.</p>
</li>
<li><p>A month-by-month trend analysis to help readers understand seasonal patterns in import demand.</p>
</li>
<li><p>A comparison of Nigeria’s top three import sources with global totals to highlight how concentrated the market is.</p>
</li>
<li><p>A discussion of Nigeria’s significant trade imbalance and what it means for production capacity, consumption habits, and economic vulnerability.</p>
</li>
<li><p>Insightful data visuals that make the trends easier to interpret.</p>
</li>
</ol>
<p>This section equips the reader with all the necessary context to appreciate the depth of the analysis and understand the larger story behind Nigeria’s dependence on imported mobile phones.</p>
]]></content:encoded></item><item><title><![CDATA[How Renewable Energy Impacts CO₂ Emissions: A Data-Driven Exploration]]></title><description><![CDATA[“Clean data can help drive a cleaner planet.”

Introduction
Breathing in Carbon: Why Renewable Energy Matters More Than Ever
Every breath we take tells a story, and lately, that story has been getting darker.Air pollution kills more than seven millio...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/how-renewable-energy-impacts-co-emissions-a-data-driven-ex</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/how-renewable-energy-impacts-co-emissions-a-data-driven-ex</guid><category><![CDATA[#RenewableEnergy  #CO2Emissions  #DataAnalysis  #WorldDevelopmentIndicators  #Sustainability  #SpearmanCorrelation  #DataVisualization]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Tue, 11 Nov 2025 22:08:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762898565678/645ca9e8-6bba-4a6d-a8f6-e2ff3b0acf8d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<blockquote>
<p>“Clean data can help drive a cleaner planet.”</p>
</blockquote>
<h2 id="heading-introduction">Introduction</h2>
<h3 id="heading-breathing-in-carbon-why-renewable-energy-matters-more-than-ever">Breathing in Carbon: Why Renewable Energy Matters More Than Ever</h3>
<p>Every breath we take tells a story, and lately, that story has been getting darker.<br />Air pollution kills more than seven million people each year, more than tuberculosis, malaria, and hepatitis combined. Those tiny particles released by cars, power plants, and factories do not just cloud the sky; they enter our lungs, our blood, and even our economy.</p>
<p>In 2023 alone, humanity lost over 512 billion work hours due to extreme heat, a direct result of a warming planet. Construction workers, farmers, and outdoor laborers, especially in low-income countries, are bearing the brunt of this crisis. The economic cost of climate disasters over the past decade has already surpassed two trillion dollars and continues to rise.</p>
<p>Yet, despite all this, the world continues to subsidize fossil fuels far more than renewables—about seven trillion dollars in 2022 compared to just 168 billion dollars for clean energy.<br />Our global budget still rewards pollution more than prevention.</p>
<p>There is a paradox here. On one hand, the technology to solve much of this already exists. Solar energy is now the cheapest source of electricity on Earth, tied with onshore wind. On the other hand, our energy systems remain stuck in the past, powered by the same carbon-heavy fuels driving our climate and health crises.</p>
<p>This led me to a question:<br />Are countries that use more renewable energy actually producing less CO₂?<br />Can we see a measurable, data-backed relationship between clean energy use and carbon emissions?</p>
<p>To explore this, I turned to data from the World Bank’s World Development Indicators for 2013. Using Python, I analyzed how renewable energy consumption relates to CO₂ emissions per capita across all available countries.<br />The results, both visual and statistical, offer a clear picture of where the world stood and what that tells us about our clean energy future.</p>
<hr />
<h2 id="heading-understanding-the-question-does-clean-energy-really-cut-emissions">Understanding the Question: Does Clean Energy Really Cut Emissions?</h2>
<p>At the heart of every major environmental discussion lies one deceptively simple question:<br />If we use more renewable energy, will we emit less carbon?</p>
<p>It sounds obvious—solar and wind do not produce CO₂ when generating electricity, while coal, oil, and gas do. But the real world is rarely that straightforward. Industrial activity, population size, energy efficiency, and geography all play a part.<br />To truly understand this relationship, I turned to the data.</p>
<h3 id="heading-objective">Objective</h3>
<p>The goal of this project was to explore how renewable energy consumption influences CO₂ emissions across countries.<br />In plain terms: Do nations that rely more on renewables tend to have smaller carbon footprints per person?</p>
<p>By analyzing this relationship, we can get a clearer sense of how effective renewable energy adoption really is in reducing emissions and which countries are leading or lagging in the global transition to cleaner energy.</p>
<hr />
<h2 id="heading-data-sources">Data Sources</h2>
<p>The analysis is based on data from the World Bank’s World Development Indicators (WDI), one of the most widely used and trusted global datasets for economic and environmental analysis.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Indicator</td><td>Code</td><td>Description</td></tr>
</thead>
<tbody>
<tr>
<td>CO₂ emissions (metric tons per capita)</td><td>EN.GHG.CO2.PC.CE.AR5</td><td>The average amount of CO₂ emitted per person in a country.</td></tr>
<tr>
<td>Renewable energy consumption (% of total final energy consumption)</td><td>EG.FEC.RNEW.ZS</td><td>The percentage of energy derived from renewable sources such as solar, wind, hydro, and bioenergy.</td></tr>
</tbody>
</table>
</div><p><strong>Year analyzed:</strong> 2013<br /><strong>Countries included:</strong> All available with valid data.</p>
<p>The year 2013 provides a clear snapshot of global energy use before the post-2015 renewable boom, offering insight into baseline patterns before major international climate policies such as the Paris Agreement took effect.</p>
<hr />
<h2 id="heading-data-preparation-process">Data Preparation Process</h2>
<p>The datasets were prepared and merged in Python using the Pandas library.</p>
<ol>
<li><p>Loaded both datasets into Pandas DataFrames.</p>
</li>
<li><p>Removed unnecessary columns such as indicator codes and years.</p>
</li>
<li><p>Merged them on country name.</p>
</li>
<li><p>Excluded regional aggregates (like “Sub-Saharan Africa”).</p>
</li>
<li><p>Dropped incomplete entries to maintain data integrity.</p>
</li>
</ol>
<p>After cleaning, the final dataset showed each country’s renewable energy share alongside CO₂ emissions per capita, ready for exploration.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Load datasets</span>
co2 = pd.read_csv(<span class="hljs-string">"/kaggle/input/pandas-dataset-2/CO2_2013_ready.csv"</span>)
renew = pd.read_csv(<span class="hljs-string">"/kaggle/input/pandas-dataset-2/RENEW_2013_ready.csv"</span>)

<span class="hljs-comment"># Drop 'year' column</span>
co2 = co2.drop(columns=[<span class="hljs-string">'year'</span>], errors=<span class="hljs-string">'ignore'</span>)
renew = renew.drop(columns=[<span class="hljs-string">'year'</span>], errors=<span class="hljs-string">'ignore'</span>)

<span class="hljs-comment"># Merge on 'country'</span>
merged = pd.merge(co2, renew, on=<span class="hljs-string">'country'</span>, how=<span class="hljs-string">'inner'</span>)

<span class="hljs-comment"># Drop missing or invalid values</span>
merged = merged.dropna()

<span class="hljs-comment"># Save and preview</span>
merged.to_csv(<span class="hljs-string">"CO2_RENEW_merged_2013.csv"</span>, index=<span class="hljs-literal">False</span>)
print(<span class="hljs-string">" Merged dataset saved as CO2_RENEW_merged_2013.csv"</span>)

merged.head()
</code></pre>
<hr />
<h2 id="heading-results-and-visualization-insights">Results and Visualization Insights</h2>
<p>Once the data was cleaned and ready, I began the analysis.</p>
<p>Across all countries:</p>
<ul>
<li><p>Average CO₂ emissions per person: approximately 4.8 metric tons</p>
</li>
<li><p>Average renewable energy share: approximately 29.4 percent</p>
</li>
</ul>
<p>That might sound decent, but beneath the surface, differences between nations were enormous.</p>
<pre><code class="lang-python">

<span class="hljs-comment"># summary statistics</span>
merge_cleaned[<span class="hljs-string">'renewable_energy_percent'</span>] = pd.to_numeric(merge_cleaned[<span class="hljs-string">'renewable_energy_percent'</span>], errors=<span class="hljs-string">'coerce'</span>)
merge_cleaned[<span class="hljs-string">'co2_per_capita'</span>] = pd.to_numeric(merge_cleaned[<span class="hljs-string">'co2_per_capita'</span>], errors=<span class="hljs-string">'coerce'</span>)

<span class="hljs-comment"># Basic summary stats</span>
summary = merge_cleaned[[<span class="hljs-string">'renewable_energy_percent'</span>, <span class="hljs-string">'co2_per_capita'</span>]].describe()
median_values = merge_cleaned[[<span class="hljs-string">'renewable_energy_percent'</span>, <span class="hljs-string">'co2_per_capita'</span>]].median()


summary_table = summary.T  <span class="hljs-comment"># transposed for better readability</span>
summary_table[<span class="hljs-string">'median'</span>] = median_values.values
summary_table
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762897439866/bde4bc5d-4b1c-479e-9ec9-7ebeeda89e76.png" alt class="image--center mx-auto" /></p>
<hr />
<h3 id="heading-what-the-numbers-say">What the Numbers Say</h3>
<p>Some countries were already leading the way, powering their economies with more than 80 percent renewable energy.<br />Others had virtually none.</p>
<p>High-income nations generally had higher CO₂ emissions per person, even if they had started adopting renewables.<br />Developing countries tended to rely more on renewables, often out of necessity—via hydropower or biomass—rather than advanced policy.</p>
<hr />
<h3 id="heading-the-highest-emitters-in-2013">The Highest Emitters in 2013</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Country</td><td>CO₂ Emissions (t CO₂e per capita)</td><td>Renewable Energy (%)</td></tr>
</thead>
<tbody>
<tr>
<td>Palau</td><td>103.37</td><td>0.0</td></tr>
<tr>
<td>Qatar</td><td>53.29</td><td>0.0</td></tr>
<tr>
<td>Trinidad and Tobago</td><td>27.82</td><td>0.4</td></tr>
<tr>
<td>Kuwait</td><td>26.56</td><td>0.0</td></tr>
<tr>
<td>Bahrain</td><td>25.64</td><td>0.0</td></tr>
<tr>
<td>United Arab Emirates</td><td>25.34</td><td>0.1</td></tr>
<tr>
<td>Saudi Arabia</td><td>19.81</td><td>0.0</td></tr>
<tr>
<td>Luxembourg</td><td>19.00</td><td>5.7</td></tr>
<tr>
<td>Brunei Darussalam</td><td>18.94</td><td>0.0</td></tr>
<tr>
<td>Oman</td><td>18.42</td><td>0.0</td></tr>
</tbody>
</table>
</div><p>Most of these nations are oil-rich economies, where energy production and national revenue depend heavily on fossil fuels. Their renewable adoption rates are almost nonexistent, which directly explains their extremely high emission levels.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762897574590/ed9ccb1c-524d-4a98-8c91-d5a5af9557e7.png" alt class="image--center mx-auto" /></p>
<hr />
<h3 id="heading-observing-the-pattern">Observing the Pattern</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> scipy.stats <span class="hljs-keyword">import</span> spearmanr

<span class="hljs-comment"># Spearman correlation</span>
corr, p_value = spearmanr(merged_clean[<span class="hljs-string">'renewable_energy_percent'</span>], merged_clean[<span class="hljs-string">'co2_per_capita'</span>])
print(<span class="hljs-string">f"Spearman correlation: <span class="hljs-subst">{corr}</span>"</span>)
print(<span class="hljs-string">f"P-value: <span class="hljs-subst">{p_value}</span>"</span>)

renewableColumn = merged_clean[<span class="hljs-string">'renewable_energy_percent'</span>]
co2Column = merged_clean[<span class="hljs-string">'co2_per_capita'</span>]

(correlation, pValue) = spearmanr(renewableColumn, co2Column)

print(<span class="hljs-string">'The correlation between Renewable Energy Consumption and CO₂ Emissions per Capita is'</span>, correlation)
<span class="hljs-keyword">if</span> pValue &lt; <span class="hljs-number">0.05</span>:
    print(<span class="hljs-string">'It is statistically significant.'</span>)
<span class="hljs-keyword">else</span>:
    print(<span class="hljs-string">'It is not statistically significant.'</span>)
</code></pre>
<p><strong>Spearman correlation coefficient:</strong> –0.573<br /><strong>P-value:</strong> 3.44 × 10⁻¹⁹</p>
<p>This indicates a strong and statistically significant negative correlation, meaning countries that use more renewables consistently produce less CO₂ per person.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762897827627/6899191d-b4fd-479b-924d-e0244022f589.png" alt class="image--center mx-auto" /></p>
<p><code>![Scatter plot showing the negative correlation between renewable energy and CO₂ emissions]</code></p>
<p>The scatter plot visually confirms the statistical finding:<br />There is a strong inverse relationship between renewable energy use and CO₂ emissions per person.</p>
<p>However, it also highlights that renewables alone don’t tell the full story.<br />Economic structure, industrialization, and population all influence the exact position of each country on the plot.</p>
<hr />
<h3 id="heading-why-it-matters">Why It Matters</h3>
<p>The analysis confirms what sustainability experts have long suspected.<br />Scaling renewable energy is one of the most effective ways to reduce per-capita emissions globally.</p>
<p>However, renewables alone are not enough.<br />Energy efficiency, industrial reform, and equitable policy changes all play vital roles.<br />Without these, even renewable-heavy countries can remain moderately carbon-intensive.</p>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<h3 id="heading-the-data-speaks-and-so-should-we">The Data Speaks and So Should We</h3>
<p>The takeaway from this study is clear.<br />Countries investing in renewables emit less carbon per person.</p>
<p>With a correlation of –0.573, the relationship is strong and significant, demonstrating that increasing solar, wind, and hydro power directly reduces a country’s carbon intensity.</p>
<p>Still, while technology has advanced, policy and investment have not kept pace. Fossil fuels continue to receive the majority of global subsidies, making it harder for clean energy to compete.</p>
<p>The good news is that solar and wind are already the cheapest sources of electricity in history.<br />The future of clean energy is not a distant goal—it is something we are already building.</p>
<hr />
<h2 id="heading-what-we-can-do">What We Can Do</h2>
<ul>
<li><p>Advocate for clean energy policies and equitable subsidies.</p>
</li>
<li><p>Support data transparency and open datasets.</p>
</li>
<li><p>Invest in research that combines data science with sustainability.</p>
</li>
<li><p>Promote energy efficiency alongside renewable adoption.</p>
</li>
</ul>
<p>Data alone will not save the planet, but people who use data can.</p>
<hr />
<h2 id="heading-authors-reflection-why-i-care-about-this">Author’s Reflection: Why I Care About This</h2>
<p>This project was more than a data analysis exercise. It was personal.<br />I have always been intrigued by how we can reduce CO₂ emissions, not just in theory but in practice. The danger it poses to our planet—from rising heat to air pollution—is no longer a distant threat. It is here, and it is accelerating.</p>
<p>Before this analysis, I worked on developing a prediction model for viable partial replacements of cement, one of the largest industrial contributors to CO₂ emissions. That experience taught me how deeply emissions are tied to the materials and systems we depend on, and how innovation in one sector can influence others.</p>
<p>It also showed me that data and sustainability are not separate fields. They are deeply connected, because understanding the data behind climate change is the first step to solving it.</p>
<p>Every dataset, every analysis, and every model brings us closer to a more sustainable world—one where we live not just on the planet, but with it.</p>
<hr />
<h2 id="heading-project-resources">Project Resources</h2>
<ul>
<li><p>Data source: <a target="_blank" href="https://databank.worldbank.org/source/world-development-indicators">World Development Indicators (World Bank)</a></p>
</li>
<li><p>Kaggle note book containing analysis : <a target="_blank" href="https://www.kaggle.com/code/ogunyemiezekiel/this-is-my-week-8-new">Renewable Energy Use and CO₂ Emission</a></p>
</li>
<li><p>Cleaned datasets:</p>
<ul>
<li><p><code>CO2_2013_ready.csv</code></p>
</li>
<li><p><code>RENEW_2013_ready.csv</code></p>
</li>
<li><p><code>CO2_RENEW_merged_2013.csv</code></p>
</li>
</ul>
</li>
<li><p>Python scripts: For data cleaning, merging, and correlation analysis</p>
</li>
<li><p>Visualizations: Scatter plots, bar charts, and descriptive summaries</p>
</li>
</ul>
<hr />
<p>If you found this project insightful, consider leaving a comment, sharing it with your network, or connecting with me. I am always eager to discuss sustainability, data, and the power of clean innovation.</p>
<hr />
]]></content:encoded></item><item><title><![CDATA[Finding the Perfect Summer Break with Data: A Weather Analysis of London ☀️]]></title><description><![CDATA[Introduction
We all love good weather, especially when it aligns with our plans. No one enjoys booking a long-awaited holiday just to spend it indoors because of constant rain. That thought guided this project. I wanted to use data to answer a simple...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/finding-the-perfect-summer-break-with-data-a-weather-analysis-of-london</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/finding-the-perfect-summer-break-with-data-a-weather-analysis-of-london</guid><category><![CDATA[#DataAnalysis  #Python  #Pandas  #DataScience  #Visualization  #WeatherData  #ProjectShowcase]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Sun, 02 Nov 2025 19:47:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762112725130/10bc1cdc-0ff6-4420-b78d-80020ad654d4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-introduction">Introduction</h2>
<p>We all love good weather, especially when it aligns with our plans. No one enjoys booking a long-awaited holiday just to spend it indoors because of constant rain. That thought guided this project. I wanted to use data to answer a simple human question:<br />When is the best time to go on a vacation in London?</p>
<p>In this project, I explored London’s 2023 weather data to find the most comfortable two-week stretch to take a summer vacation. The dataset came from Meteostat, a platform that provides open access to historical weather and climate data.</p>
<hr />
<h2 id="heading-data-description">Data description</h2>
<p>The dataset covered one full year of London’s daily weather from temperatures to rainfall, sunshine, and wind measurements.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Column</td><td>Meaning</td><td>Unit</td></tr>
</thead>
<tbody>
<tr>
<td>date</td><td>Observation date</td><td>—</td></tr>
<tr>
<td>tavg</td><td>Mean air temperature</td><td>°C</td></tr>
<tr>
<td>tmin</td><td>Minimum air temperature</td><td>°C</td></tr>
<tr>
<td>tmax</td><td>Maximum air temperature</td><td>°C</td></tr>
<tr>
<td>prcp</td><td>Total precipitation</td><td>mm</td></tr>
<tr>
<td>snow</td><td>Snow depth</td><td>mm</td></tr>
<tr>
<td>wdir</td><td>Wind direction</td><td>° (empty for 2023)</td></tr>
<tr>
<td>wspd</td><td>Average wind speed</td><td>km/h</td></tr>
<tr>
<td>wpgt</td><td>Peak wind gust</td><td>km/h</td></tr>
<tr>
<td>pres</td><td>Sea-level air pressure</td><td>hPa</td></tr>
<tr>
<td>tsun</td><td>Sunshine duration</td><td>minutes</td></tr>
</tbody>
</table>
</div><p>Each column provided a clue about London’s weather patterns and helped build a realistic idea of when the weather feels most pleasant.</p>
<hr />
<h2 id="heading-step-1-data-preparation-and-cleaning">Step 1: Data Preparation and Cleaning</h2>
<p>Before bringing the dataset into Python, I made sure it looked right in Excel. At first, all the values were packed into one column, so I used:</p>
<p><strong>Data → Text to Columns → Delimited → Comma ( , ) → Finish</strong></p>
<p>That simple step split everything neatly into separate columns, making the file readable.</p>
<p>Once the data looked fine, I loaded it into <strong>Pandas</strong> for a proper clean-up. I used the parameter <code>skipinitialspace=True</code> while importing . This removed any extra spaces that appeared after commas in the CSV file.</p>
<p>Next, I converted the <strong>“date”</strong> column with <a target="_blank" href="http://pd.to"><code>pd.to</code></a><code>_datetime()</code> so Python could recognize it as an actual date instead of plain text. After that, I set the date as the index to make it easier to work with time-based operations like grouping by months or weeks.</p>
<p>By this point, the dataset was clean, organized, and ready for deeper analysis.</p>
<p>Here’s the code I used:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Loading the dataset</span>
london_2023 = pd.read_csv(<span class="hljs-string">"London_2023.csv"</span>, skipinitialspace=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Converting 'date' column to datetime format</span>
london_2023[<span class="hljs-string">'date'</span>] = pd.to_datetime(london_2023[<span class="hljs-string">'date'</span>], errors=<span class="hljs-string">'coerce'</span>)

<span class="hljs-comment"># Settting 'date' as the index for time-based analysis</span>
london_2023.set_index(<span class="hljs-string">'date'</span>, inplace=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Quick check</span>
print(london_2023.info())
print(london_2023.head())
</code></pre>
<hr />
<h2 id="heading-step-2-focusing-on-the-summer-and-creating-a-comfort-score">Step 2: Focusing on the Summer and Creating a Comfort Score</h2>
<p>Since London is in the Northern Hemisphere, I decided to focus on June, July, and August the summer months.<br />The idea was to find the two-week window that combined warm temperatures, low rainfall, and good sunshine.</p>
<p>But how do you define “good weather”?<br />To make that measurable, I created what I called a comfort score a simple formula to rate each day based on three main factors:</p>
<ul>
<li><p><strong>Temperature</strong> (40%) – How close it was to the ideal 22°C.</p>
</li>
<li><p><strong>Rainfall</strong> (30%) – Less rainfall means higher comfort.</p>
</li>
<li><p><strong>Sunshine</strong> (30%) – More sunshine adds to comfort.</p>
</li>
</ul>
<p>Each of these factors was normalized between 0 and 1, then combined with weights to calculate a daily comfort score.<br />Finally, I used a 14-day rolling average to find longer periods of consistently good weather.</p>
<p>Here’s the code that brought it all together:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Filtering for June, July, August</span>
summer = london_2023[london_2023.index.month.isin([<span class="hljs-number">6</span>, <span class="hljs-number">7</span>, <span class="hljs-number">8</span>])]
print(summer.head())

<span class="hljs-comment"># Creating a copy of summer dataframe</span>
summer = summer.copy()

<span class="hljs-comment"># Ideal temperature for comfort</span>
ideal_temp = <span class="hljs-number">22</span>

<span class="hljs-comment"># Normalizing and calculating component scores</span>
summer.loc[:, <span class="hljs-string">'temp_score'</span>] = <span class="hljs-number">1</span> - (abs(summer[<span class="hljs-string">'tavg'</span>] - ideal_temp) / ideal_temp)
summer.loc[:, <span class="hljs-string">'rain_score'</span>] = <span class="hljs-number">1</span> - (summer[<span class="hljs-string">'prcp'</span>] / summer[<span class="hljs-string">'prcp'</span>].max())
summer.loc[:, <span class="hljs-string">'sun_score'</span>] = summer[<span class="hljs-string">'tsun'</span>] / summer[<span class="hljs-string">'tsun'</span>].max()

<span class="hljs-comment"># Cliping negative values to 0</span>
summer.loc[:, [<span class="hljs-string">'temp_score'</span>, <span class="hljs-string">'rain_score'</span>, <span class="hljs-string">'sun_score'</span>]] = summer[[<span class="hljs-string">'temp_score'</span>, <span class="hljs-string">'rain_score'</span>, <span class="hljs-string">'sun_score'</span>]].clip(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>)

<span class="hljs-comment"># Weighted comfort score</span>
summer.loc[:, <span class="hljs-string">'comfort_score'</span>] = (
<span class="hljs-number">0.4</span> * summer[<span class="hljs-string">'temp_score'</span>] +
    <span class="hljs-number">0.3</span> * summer[<span class="hljs-string">'rain_score'</span>] +
    <span class="hljs-number">0.3</span> * summer[<span class="hljs-string">'sun_score'</span>])

<span class="hljs-comment"># Rolling mean (14 days)</span>
summer.loc[:, <span class="hljs-string">'rolling_comfort'</span>] = summer[<span class="hljs-string">'comfort_score'</span>].rolling(window=<span class="hljs-number">14</span>).mean()

<span class="hljs-comment"># Finding the best 2-week window</span>
best_start = summer[<span class="hljs-string">'rolling_comfort'</span>].idxmax()
best_end = best_start + pd.Timedelta(days=<span class="hljs-number">13</span>)

print(<span class="hljs-string">"Best vacation period:"</span>, best_start.date(), <span class="hljs-string">"to"</span>, best_end.date())
</code></pre>
<p>By combining temperature, rainfall, and sunshine into a single value, I could see which days felt the most balanced and pleasant overall.</p>
<hr />
<h2 id="heading-step-4-visualizing-the-findings">Step 4: Visualizing the Findings</h2>
<p>To make the results more meaningful, I created a line chart showing the comfort score over time. Then, I shaded the best two-week window that the data identified.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt

plt.figure(figsize=(<span class="hljs-number">12</span>, <span class="hljs-number">6</span>))

<span class="hljs-comment"># Plot daily comfort score</span>
plt.plot(summer.index, summer[<span class="hljs-string">'comfort_score'</span>], 
         label=<span class="hljs-string">'Daily Comfort Score'</span>, color=<span class="hljs-string">'skyblue'</span>, alpha=<span class="hljs-number">0.6</span>)

<span class="hljs-comment"># Plot 14-day rolling comfort score</span>
plt.plot(summer.index, summer[<span class="hljs-string">'rolling_comfort'</span>], 
         label=<span class="hljs-string">'14-day Rolling Average'</span>, color=<span class="hljs-string">'blue'</span>, linewidth=<span class="hljs-number">2</span>)

<span class="hljs-comment"># Highlight best vacation period</span>
plt.axvspan(best_start, best_end, color=<span class="hljs-string">'green'</span>, alpha=<span class="hljs-number">0.3</span>, label=<span class="hljs-string">'Best Vacation Period'</span>)

<span class="hljs-comment"># Labels and title</span>
plt.title(<span class="hljs-string">'Daily and Rolling Comfort Score – Summer 2023'</span>)
plt.xlabel(<span class="hljs-string">'Date'</span>)
plt.ylabel(<span class="hljs-string">'Comfort Score'</span>)
plt.legend()
plt.grid(<span class="hljs-literal">True</span>)

plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762111879353/4c60019c-de67-4047-ad12-68c8267bb55e.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-step-5-results">Step 5: Results</h2>
<p>The analysis found that the best time to take a summer break in London was from <strong>June 27 to July 10, 2023.</strong></p>
<p>During these two weeks, the temperature stayed between <strong>20–23°C</strong>, rainfall was low, and sunshine hours were longer. It was the ideal balance of all three factors.</p>
<p>Here are the top three vacation periods based on the rolling comfort score:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Rank</td><td>Dates</td><td>Score</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>June 27 – July 10</td><td>0.833</td></tr>
<tr>
<td>2</td><td>June 26 – July 9</td><td>0.833</td></tr>
<tr>
<td>3</td><td>June 17 – June 30</td><td>0.825</td></tr>
</tbody>
</table>
</div><p>These windows represent stretches of mild temperatures, minimal rain, and plenty of sunshine exactly what most people hope for during a London summer.</p>
<hr />
<h2 id="heading-challenges-i-faced">Challenges I Faced</h2>
<p>Like most projects, this one didn’t go smoothly at first.</p>
<ul>
<li><p>I started by downloading 12 separate CSV files from Weather Underground, one for each month of 2023. Merging them manually was stressful and prone to errors. After a few hours of frustration, I switched to Meteostat, which offered a better alternative.</p>
</li>
<li><p>I also had to think carefully about how to balance the comfort score giving too much weight to one factor could skew the results.</p>
</li>
<li><p>Finally, I had to manage missing or incomplete data while keeping the analysis realistic.</p>
</li>
</ul>
<p>These challenges helped me understand how real-world data work it’s not perfect, and cleaning it up is as important as the analysis itself.</p>
<hr />
<h2 id="heading-this-weeks-reflection">This week’s reflection</h2>
<p>By week seven of my learning journey, I realized that data science isn’t just about accuracy or prediction.<br />It’s about understanding context and solving real problems, even simple ones like knowing when the weather might treat you kindly.</p>
<p>This project taught me how to turn data into a story one that can influence real-life decisions.</p>
<p>Maybe the code didn’t predict stock prices or train a neural network, but it did something equally valuable: it helped find the best time to pause, breathe, and enjoy life.</p>
<p>And maybe that’s what good data work should be about.</p>
<hr />
<h3 id="heading-helpful-resources"><strong>Helpful Resources</strong></h3>
<p><strong>Dataset:</strong> <a target="_blank" href="https://meteostat.net/en/station/03772?t=2023-01-01/2023-12-31">https://meteostat.net/en/station/03772?t=2023-01-01/2023-12-31</a></p>
]]></content:encoded></item><item><title><![CDATA[Practical Data Analysis: My Experience with NumPy and Pandas]]></title><description><![CDATA[Introduction
The most interesting thing about NumPy and pandas isn’t their speed. It’s how they quietly handle chaos. You can feed them messy, inconsistent data from six cities, and they’ll help you turn it into something that makes sense.
That’s wha...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/practical-data-analysis-my-experience-with-numpy-and-pandas</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/practical-data-analysis-my-experience-with-numpy-and-pandas</guid><category><![CDATA[#DataAnalysis #Python #Pandas #NumPy #DataCleaning #EDA #DataScience #PythonForDataAnalysis #OpenData #Analytics]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Sun, 26 Oct 2025 19:36:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761608802044/328ee979-f703-4de1-a9f8-e64dd64a77cb.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-introduction">Introduction</h2>
<p>The most interesting thing about <strong>NumPy</strong> and <strong>pandas</strong> isn’t their speed. It’s how they quietly handle chaos. You can feed them messy, inconsistent data from six cities, and they’ll help you turn it into something that makes sense.</p>
<p>That’s what I worked on this week ; exploring, cleaning, and analyzing real datasets using both tools.</p>
<hr />
<h2 id="heading-understanding-the-tools">Understanding the Tools</h2>
<p>Before analysis, I wanted to understand what makes these two libraries work so well together.</p>
<p><strong>NumPy</strong> handles numerical computation. It’s the foundation that powers most of data science in Python. It offers arrays, matrix operations, and mathematical functions that make large-scale calculations fast and efficient. For example, dividing every value in a list by 100 or slicing part of a 3D array takes one line.</p>
<p><strong>Pandas</strong> builds on that foundation. It adds structure through Data Frames ; tables of rows and columns you can manipulate easily. Think of it as Excel with Python-level control. You can clean, group, and summarize data in a way that’s both logical and flexible.</p>
<p>In short:</p>
<ul>
<li><p><strong>NumPy is for numbers.</strong></p>
</li>
<li><p><strong>Pandas is for meaning.</strong></p>
</li>
</ul>
<hr />
<h2 id="heading-core-tasks-i-worked-on">Core Tasks I Worked On</h2>
<p>I worked with six CSV files <code>Beijing</code>, <code>Brasilia</code>, <code>Cape Town</code>, <code>Delhi</code>, <code>London</code>, and <code>Moscow</code>. They looked uniform at first glance but weren’t. Once loaded into pandas, I found inconsistencies that made merging impossible without cleanup.</p>
<p>Here’s how I handled them.</p>
<h3 id="heading-loading-and-standardizing-the-data">Loading and Standardizing the Data</h3>
<p><strong>1. Encoding differences</strong></p>
<p>When I tried loading the London dataset, it didn’t display correctly and showed a “bad delimiter” error. This usually happens when the file contains special characters that aren’t supported by the default encoding. To fix it, I reloaded the file using a different encoding format called <code>Latin-1</code>, which handles a wider range of characters. After that adjustment, the file loaded without issues.</p>
<pre><code class="lang-python">pd.read_csv(<span class="hljs-string">'London_2014.csv'</span>, encoding=<span class="hljs-string">'latin1'</span>)
</code></pre>
<p>Once confirmed, I used this encoding only for that file.</p>
<p><strong>2. Inconsistent column names</strong></p>
<p>Some files had extra spaces, HTML tags, or inconsistent casing. I standardized all headers:</p>
<pre><code class="lang-python">df.columns = (
    df.columns
    .str.strip()
    .str.replace(<span class="hljs-string">' '</span>, <span class="hljs-string">'_'</span>)
    .str.replace(<span class="hljs-string">'&lt;br_/&gt;'</span>, <span class="hljs-string">''</span>, regex=<span class="hljs-literal">True</span>)
)
</code></pre>
<p><strong>3. Adding city identifiers</strong></p>
<p>Each CSV represented one city. To track them after merging:</p>
<pre><code class="lang-python">df[<span class="hljs-string">'City'</span>] = file.split(<span class="hljs-string">'_'</span>)[<span class="hljs-number">0</span>]
</code></pre>
<p><strong>4. Combining all datasets</strong></p>
<p>After cleanup:</p>
<pre><code class="lang-python">all_data = pd.concat(dfs, ignore_index=<span class="hljs-literal">True</span>)
</code></pre>
<p>The merged dataset had 2,190 rows and 25 columns.</p>
<p><strong>5. Verification</strong></p>
<p>I confirmed the structure and data types before cleaning:</p>
<pre><code class="lang-python">all_data.info()
all_data.head()
</code></pre>
<hr />
<h3 id="heading-cleaning-and-preparing-the-data">Cleaning and Preparing the Data</h3>
<p>Real-world data always comes with missing or noisy values the same applied to this. I had some checks to know what to do with the data . The first check:</p>
<pre><code class="lang-python">all_data.isnull().sum()
</code></pre>
<p><strong>Findings:</strong></p>
<ul>
<li><p><code>CloudCover</code> had 432 missing entries</p>
</li>
<li><p><code>Max_Gust_SpeedKm/h</code> and <code>GMT</code> were mostly empty</p>
</li>
<li><p>Visibility columns had small gaps</p>
</li>
<li><p><code>Events</code> had partial missing text</p>
</li>
</ul>
<p><strong>Fixes applied:</strong></p>
<p>I removed the <code>GMT</code> and <code>Max_Gust_SpeedKm/h</code> columns because they were mostly empty and not useful for analysis. For the visibility columns I replaced the missing values with each column’s average to keep the data consistent. The Events column had missing text entries, so I filled those gaps with the word “None” to indicate no recorded event instead of leaving them blank.</p>
<pre><code class="lang-python">all_data.drop([<span class="hljs-string">'GMT'</span>, <span class="hljs-string">'Max_Gust_SpeedKm/h'</span>], axis=<span class="hljs-number">1</span>, inplace=<span class="hljs-literal">True</span>)
all_data[<span class="hljs-string">'Mean_VisibilityKm'</span>] = all_data[<span class="hljs-string">'Mean_VisibilityKm'</span>].fillna(all_data[<span class="hljs-string">'Mean_VisibilityKm'</span>].mean())
all_data[<span class="hljs-string">'Max_VisibilityKm'</span>] = all_data[<span class="hljs-string">'Max_VisibilityKm'</span>].fillna(all_data[<span class="hljs-string">'Max_VisibilityKm'</span>].mean())
all_data[<span class="hljs-string">'Min_VisibilitykM'</span>] = all_data[<span class="hljs-string">'Min_VisibilitykM'</span>].fillna(all_data[<span class="hljs-string">'Min_VisibilitykM'</span>].mean())
all_data[<span class="hljs-string">'Events'</span>] = all_data[<span class="hljs-string">'Events'</span>].fillna(<span class="hljs-string">'None'</span>)
</code></pre>
<p>After this, the only remaining missing column was <code>CloudCover</code>. I left it unfilled since imputing it could distort results.</p>
<hr />
<h3 id="heading-exploring-the-data">Exploring the Data</h3>
<p>After cleaning the dataset, I grouped the data by <code>City</code> to understand weather patterns across locations. For each city, I calculated the average, maximum, and minimum temperatures, the total precipitation, and the average humidity. This summary gave a clear comparison of climate characteristics for all six cities in one view</p>
<pre><code class="lang-python">summary = all_data.groupby(<span class="hljs-string">'City'</span>).agg({
    <span class="hljs-string">'Mean_TemperatureC'</span>: <span class="hljs-string">'mean'</span>,
    <span class="hljs-string">'Max_TemperatureC'</span>: <span class="hljs-string">'max'</span>,
    <span class="hljs-string">'Min_TemperatureC'</span>: <span class="hljs-string">'min'</span>,
    <span class="hljs-string">'Precipitationmm'</span>: <span class="hljs-string">'sum'</span>,
    <span class="hljs-string">'Mean_Humidity'</span>: <span class="hljs-string">'mean'</span>
})
print(summary)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td>City</td><td>Mean Temp (°C)</td><td>Max Temp (°C)</td><td>Min Temp (°C)</td><td>Precipitation (mm)</td><td>Mean Humidity (%)</td></tr>
</thead>
<tbody>
<tr>
<td>Beijing</td><td>13.36</td><td>42</td><td>-13</td><td>405.89</td><td>50.75</td></tr>
<tr>
<td>Brasilia</td><td>22.90</td><td>36</td><td>9</td><td>751.65</td><td>58.07</td></tr>
<tr>
<td>Cape Town</td><td>17.57</td><td>37</td><td>1</td><td>428.25</td><td>68.76</td></tr>
<tr>
<td>Delhi</td><td>13.71</td><td>38</td><td>-17</td><td>225.31</td><td>50.64</td></tr>
<tr>
<td>London</td><td>12.33</td><td>30</td><td>-4</td><td>503.10</td><td>73.80</td></tr>
<tr>
<td>Moscow</td><td>5.99</td><td>33</td><td>-26</td><td>0.00</td><td>73.38</td></tr>
</tbody>
</table>
</div><p><strong>My key Findings:</strong></p>
<p>London and Moscow had the highest humidity.<br />Brasilia recorded the most rainfall.<br />Moscow showed zero precipitation likely a recording issue, not an actual dry year.</p>
<p><strong>Correlation analysis</strong></p>
<p>Next, I ran a correlation analysis to see how the numerical weather variables relate to each other. This step measures how changes in one variable (like temperature) correspond to changes in another (like humidity or visibility). By calculating correlations only for numeric columns, I could identify strong relationships. For example, temperatures showing high positive correlation among themselves, and humidity showing a moderate negative correlation with visibility</p>
<pre><code class="lang-python">corr = all_data.corr(numeric_only=<span class="hljs-literal">True</span>)
print(corr)
</code></pre>
<p><strong>My key Findings:</strong></p>
<ul>
<li><p>Temperature columns (Max, Mean, Min) were strongly correlated.</p>
</li>
<li><p>Visibility had a moderate negative correlation with humidity.</p>
</li>
<li><p>Cloud cover correlated negatively with both temperature and visibility.</p>
</li>
</ul>
<p>Even without charts, this gave a clear sense of how weather variables interacted.</p>
<hr />
<h3 id="heading-practicing-with-numpy-and-the-who-pop-tb-dataset">Practicing with NumPy and the WHO POP TB Dataset</h3>
<p><strong>NumPy element-wise comparison</strong></p>
<p>I practiced <strong>NumPy’s element-wise comparison</strong> amongst othersto understand how it evaluates two arrays value by value. Using simple arrays of numbers, I compared them with operations like “greater than,” “less than,” and “equal to.” NumPy instantly returned Boolean results (<code>True</code> or <code>False</code>) for each position, showing how efficiently it can handle large-scale comparisons without loops. This helped me see how NumPy’s vectorized operations make numerical analysis both fast and intuitive.</p>
<pre><code class="lang-python">a = np.array([<span class="hljs-number">2</span>, <span class="hljs-number">4</span>, <span class="hljs-number">9</span>])
b = np.array([<span class="hljs-number">1</span>, <span class="hljs-number">5</span>, <span class="hljs-number">9</span>])
print(np.greater(a, b))
print(np.greater_equal(a, b))
print(np.less(a, b))
print(np.less_equal(a, b))
</code></pre>
<p>These quick operations show how efficiently NumPy performs comparisons across arrays the same logic applies to larger datasets.</p>
<p><strong>also;</strong></p>
<p>To strengthen my understanding of indexing and conditional selection, The task had me practice with the <code>WHO POP TB</code> dataset. It includes country-level population and tuberculosis statistics.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Display the 55th row</span>
df.iloc[<span class="hljs-number">54</span>]

<span class="hljs-comment"># Display first 10 rows</span>
df.head(<span class="hljs-number">10</span>)

<span class="hljs-comment"># Show first 8 rows of Country and TB deaths</span>
df[[<span class="hljs-string">'Country'</span>, <span class="hljs-string">'TB deaths'</span>]].head(<span class="hljs-number">8</span>)

<span class="hljs-comment"># Find countries with TB deaths &gt; 10,000</span>
df[df[<span class="hljs-string">'TB deaths'</span>] &gt; <span class="hljs-number">10000</span>]

<span class="hljs-comment"># Find where population ≤ 50,000 or TB deaths ≥ 20,000</span>
df[(df[<span class="hljs-string">'Population (1000s)'</span>] &lt;= <span class="hljs-number">50000</span>) | (df[<span class="hljs-string">'TB deaths'</span>] &gt;= <span class="hljs-number">20000</span>)]
</code></pre>
<p>This exercise helped reinforce selection logic and conditional filtering skills that make real-world data analysis smoother and faster.</p>
<hr />
<h3 id="heading-challenges-i-faced">Challenges I Faced</h3>
<p>I ran into a few practical issues during the process. The London CSV file used a different encoding, which caused loading errors until I specified the correct one. Some column names had hidden spaces and HTML tags that made merging fail until I standardized them. The <code>CloudCover</code> column was incomplete and couldn’t be filled reliably without distorting results. I also noticed type mismatches when comparing numeric columns, which required conversions before analysis. These challenges might seem minor, but they show how unpredictable real datasets can be and why careful data handling is key to working effectively with pandas.</p>
<ul>
<li><p>The London CSV used a different encoding.</p>
</li>
<li><p>Hidden spaces and HTML tags broke merges.</p>
</li>
<li><p><code>CloudCover</code> was incomplete.</p>
</li>
<li><p>Type mismatches appeared when comparing numeric columns.</p>
</li>
</ul>
<hr />
<h3 id="heading-my-perspective-and-takeaways">My Perspective and Takeaways</h3>
<p>This week reinforced one main idea ; most data analysis happens before visualization or modeling.<br />The key steps are <strong>loading</strong>, <strong>cleaning</strong>, and <strong>validating</strong> the data. That’s where real insight begins.</p>
<p>A few lessons stood out:</p>
<ul>
<li><p>Check encoding before merging files.</p>
</li>
<li><p>Standardize column names early.</p>
</li>
<li><p>Fill missing values only after understanding their meaning.</p>
</li>
<li><p>Use correlation to identify patterns quickly.</p>
</li>
</ul>
<p>NumPy and pandas aren’t flashy tools, but they’re dependable ones.<br />Once your data is structured and consistent, finding insights becomes the easy part.</p>
<hr />
<h3 id="heading-helpful-resources">HELPFUL RESOURCES</h3>
<p><strong>Youtube:</strong> <a target="_blank" href="https://youtu.be/wUSDVGivd-8?si=d05zHtoyNTABnIj1">https://youtu.be/wUSDVGivd-8?si=d05zHtoyNTABnIj1</a></p>
]]></content:encoded></item><item><title><![CDATA[Improving Methods for Predicting Concrete Strength]]></title><description><![CDATA[Intoduction
This research proposal came from looking back at my final-year project. I had developed a strength prediction model for concrete using regression and ANOVA in Excel. The goal was to predict compressive strength from mix ratios and curing ...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/improving-methods-for-predicting-concrete-strength</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/improving-methods-for-predicting-concrete-strength</guid><category><![CDATA[#MachineLearning #CivilEngineering #DataScience #Concrete #Sustainability #Research  ]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Thu, 23 Oct 2025 11:26:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761216739691/60d5e98b-afdd-43a5-b764-025172956831.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h3 id="heading-intoduction"><strong>Intoduction</strong></h3>
<p>This research proposal came from looking back at my final-year project. I had developed a strength prediction model for concrete using regression and ANOVA in Excel. The goal was to predict compressive strength from mix ratios and curing conditions.</p>
<p>The model worked within a narrow range. Once the material composition changed, accuracy dropped. That made me think deeper about prediction in concrete mix design and how traditional methods struggle when data doesn’t follow simple patterns.</p>
<hr />
<h3 id="heading-integrating-past-work-into-future-vision">Integrating Past Work into Future Vision</h3>
<p>one of my recommendations was to “apply advanced computational models to improve strength prediction accuracy.” It was a brief note at the time, but it later shaped my new proposal:</p>
<blockquote>
<p><strong>A Machine Learning–Based Predictive Framework for Estimating the Compressive Strength of Hybrid Concretes Incorporating Supplementary Cementitious Materials.</strong></p>
</blockquote>
<p>The proposal combines civil engineering practice with data science. It focuses on:</p>
<ul>
<li><p>Using open-source concrete datasets</p>
</li>
<li><p>Cleaning and preprocessing data</p>
</li>
<li><p>Training and testing several ML models with cross-validation</p>
</li>
<li><p>Using SHAP analysis to identify which inputs (binder ratio, curing age, water-to-binder ratio, etc.) influence strength the most</p>
</li>
</ul>
<p>The aim is to build a data-driven framework that helps engineers design concrete mixes that use less cement while maintaining performance.</p>
<hr />
<h3 id="heading-why-this-matters">Why This Matters</h3>
<p>Concrete production accounts for about 8% of global CO₂ emissions, mainly from cement. Reducing cement content without losing strength is a key engineering problem.</p>
<p>If machine learning can predict the right mix combinations, you can:</p>
<ul>
<li><p>Cut down on laboratory trials</p>
</li>
<li><p>Save time and material costs</p>
</li>
<li><p>Design more efficient mixes</p>
</li>
<li><p>Reduce cement use and emissions</p>
</li>
</ul>
<p>This approach doesn’t replace engineering judgment. It supports it with data. Engineers still make decisions, but with better insight from models that learn from large datasets.</p>
<p>Using ML in mix design could make research more data-oriented, less repetitive, and more sustainable.</p>
<hr />
<h3 id="heading-looking-back-what-i-learned">Looking Back: What I Learned</h3>
<p>In most mix design practice, engineers rely on:</p>
<ul>
<li><p>Trial-and-error methods</p>
</li>
<li><p>Empirical formulas</p>
</li>
<li><p>Linear regression models</p>
</li>
</ul>
<p>These methods are easy to apply but assume relationships between inputs and strength stay constant. In reality, materials change. Cement from different sources, variations in aggregates, and the use of supplementary materials all shift outcomes.</p>
<p>Regression models can’t always handle these changes. They assume straight-line relationships where the real world behaves differently.</p>
<hr />
<h3 id="heading-why-machine-learning-makes-sense">Why Machine Learning Makes Sense</h3>
<p>Machine learning (ML) doesn’t rely on fixed formulas. It learns directly from data. That means it can capture how multiple variables interact even when the relationship isn’t linear.</p>
<p>For example, in concrete with additives like fly ash or laterite, strength gain doesn’t increase uniformly with mix ratio. ML models such as:</p>
<ul>
<li><p>Random Forest</p>
</li>
<li><p>XGBoost</p>
</li>
<li><p>Neural Networks</p>
</li>
</ul>
<p>can find those patterns automatically.</p>
<p>Here’s the difference:</p>
<ul>
<li><p>Regression fits one global equation to all data.</p>
</li>
<li><p>ML adapts to local variations and complex interactions.</p>
</li>
</ul>
<p>That’s why ML suits concrete strength prediction it can work with the uncertainty that traditional models can’t explain.</p>
<hr />
<h3 id="heading-whats-next">What’s Next</h3>
<p>My next step is to build the technical foundation for this work. I’m focusing on:</p>
<ul>
<li><p>Python for analysis and scripting</p>
</li>
<li><p>Scikit-learn, XGBoost, and TensorFlow for model development</p>
</li>
<li><p>Data visualization tools like Power BI and Excel for results interpretation</p>
</li>
</ul>
<p>The plan is to turn this proposal into a capstone project once I’m confident with these tools. It will cover everything from data preprocessing to feature analysis and model validation.</p>
<p>This direction connects my civil engineering background with data science. It’s about improving how concrete performance is predicted using data, not guesswork.</p>
<p>The future of construction depends on smarter design. Machine learning can help make that happen.</p>
<hr />
<h2 id="heading-closing-thought">Closing Thought</h2>
<p>Construction is evolving. The way we design, predict, and test materials evolve too. Machine learning offers a practical way to make that shift not by replacing engineers, but by giving them better tools to make data-backed decisions.</p>
<p>For me, this research isn’t just academic, it’s a direction. It connects what I’ve done before with where I want to go next: using machine learning to improve how we predict and design sustainable concrete.</p>
]]></content:encoded></item><item><title><![CDATA[Comprehensive Guide to Python Modules: JSON, Math, and Beyond]]></title><description><![CDATA[Introduction: When Coding Gets Real
There’s a point in every beginner’s coding journey where things stop feeling like simple toy problems and start resembling the real world. For me, that point came in Week 3 of my learning journey with Dataraflow.
U...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/comprehensive-guide-to-python-modules-json-math-and-beyond</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/comprehensive-guide-to-python-modules-json-math-and-beyond</guid><category><![CDATA[#Python #DataScience  #LearningInPublic  #Dataraflow]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Sun, 28 Sep 2025 20:00:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759088722329/bfb2af3f-9605-4093-88ea-2b6798908138.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-introduction-when-coding-gets-real">Introduction: When Coding Gets Real</h2>
<p>There’s a point in every beginner’s coding journey where things stop feeling like simple toy problems and start resembling the real world. For me, that point came in <strong>Week 3</strong> of my learning journey with <strong>Dataraflow</strong>.</p>
<p>Up until last week, I was happily experimenting with <strong>Object-Oriented Programming (OOP)</strong> — creating classes, inheriting attributes, and even trying my hands on polymorphism. But now the story shifted. Suddenly, I wasn’t just writing my own code; I was asked to step into Python’s <em>toolbox</em> and start using <strong>modules</strong> — powerful pre-built functionalities that make programming more efficient.</p>
<p>And let me be honest: at first, it felt overwhelming.<br />Math functions, date manipulation, JSON file handling, virtual environments — it was like walking into a supermarket for the first time and realising you need more than a basket. But as I dug deeper, I realised that these modules are the very <em>shortcuts</em> that transform you from a beginner writing “Hello World” into a problem-solver who can build real systems.</p>
<hr />
<h2 id="heading-theory-modules-packages-and-why-they-matter">Theory: Modules, Packages, and Why They Matter</h2>
<p>So, what exactly did I learn?</p>
<p>A <strong>module</strong> is basically a Python file containing code ( functions, classes, or variables ) that you can reuse in different programs. For example, the <code>math</code> module saves you the pain of writing formulas from scratch. A <strong>package</strong>, on the other hand, is a collection of modules organised neatly in directories, often with an <code>__init__.py</code> file that signals “Hey, I’m a package!” (think of libraries like <code>numpy</code> or <code>pandas</code>).</p>
<p>Why does this matter? Because as projects get bigger, no one has time to reinvent the wheel. You don’t want to write your own trigonometry functions or build your own date formatter. Instead, you <em>borrow</em> tools from Python’s rich ecosystem and focus on solving your unique problem.</p>
<p>Alongside modules, I met some equally powerful companions:</p>
<ul>
<li><p><strong>JSON (JavaScript Object Notation):</strong> A universal data format that makes Python dictionaries talk to the outside world (APIs, files, web apps).</p>
</li>
<li><p><strong>Datetime:</strong> Working with dates and times. Suddenly, birthdays, deadlines, and countdowns became programmable.</p>
</li>
<li><p><strong>Error Handling (</strong><code>try/except</code>): A lifesaver that catches mistakes gracefully instead of letting your program crash.</p>
</li>
<li><p><strong>Virtual Environments:</strong> My own coding bubble where dependencies don’t clash. Essential for serious projects.</p>
</li>
</ul>
<p>It felt like moving from toy blocks to a real workshop full of tools.</p>
<hr />
<h2 id="heading-practical-tasks-building-with-modules">Practical Tasks: Building with Modules</h2>
<p>The week wasn’t just theory. I had hands-on assignments, and with each one, I discovered something new about myself as a programmer. some of these tasks are listed below with some boring codes just to show it’s practicality</p>
<h3 id="heading-task-1-math-module">Task 1: Math Module</h3>
<p><strong>What I Learnt:</strong> I didn’t need to memorise or code formulas, Python had my back.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> math

print(<span class="hljs-string">"Square root of 144:"</span>, math.sqrt(<span class="hljs-number">144</span>))
print(<span class="hljs-string">"Factorial of 6:"</span>, math.factorial(<span class="hljs-number">6</span>))
print(<span class="hljs-string">"Pi constant:"</span>, math.pi)
</code></pre>
<p>This was my first “oh, oh!” moment. Instead of writing multiple lines to calculate factorials, <code>math.factorial(6)</code> gave me the result instantly. It made me feel like I was wielding a scientific calculator built into Python.</p>
<hr />
<h3 id="heading-task-2-datetime-module">Task 2: Datetime Module</h3>
<p><strong>What I Learnt:</strong> Dates can be messy. Is it 09/12/2025 or 12/09/2025? Python’s <code>datetime</code> stripped away confusion and let me format time however I wanted.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime

now = datetime.now()
print(<span class="hljs-string">"Current Date &amp; Time:"</span>, now)
print(<span class="hljs-string">"Formatted:"</span>, now.strftime(<span class="hljs-string">"%d-%m-%Y"</span>))
</code></pre>
<p>I also practiced calculating the number of days until my next birthday. Suddenly, math + datetime became personal.</p>
<hr />
<h3 id="heading-task-3-json-module">Task 3: JSON Module</h3>
<p><strong>What I Learnt:</strong> JSON is like a bridge between Python and the world.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json

student = {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Ezekiel"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">23</span>, <span class="hljs-string">"grade"</span>: <span class="hljs-string">"A"</span>}
student_json = json.dumps(student)   <span class="hljs-comment"># dict → JSON</span>
print(<span class="hljs-string">"JSON String:"</span>, student_json)

parsed = json.loads(student_json)    <span class="hljs-comment"># JSON → dict</span>
print(<span class="hljs-string">"Parsed Dict:"</span>, parsed)
</code></pre>
<p>When I first saw curly braces and strings, I thought: <em>“Wait, isn’t this just a dictionary?”</em> But then it clicked — JSON is the universal <strong>language of the internet</strong>. That’s how APIs talk. That’s how data travels between apps.</p>
<hr />
<h3 id="heading-task-4-error-handling">Task 4: Error Handling</h3>
<p><strong>What I Learnt:</strong> Errors don’t have to kill your code. They can be tamed.</p>
<pre><code class="lang-python"><span class="hljs-keyword">try</span>:
    num = int(input(<span class="hljs-string">"Enter a number: "</span>))
    print(<span class="hljs-string">"100 divided by"</span>, num, <span class="hljs-string">"="</span>, <span class="hljs-number">100</span> / num)
<span class="hljs-keyword">except</span> ZeroDivisionError:
    print(<span class="hljs-string">"Oops! You can’t divide by zero."</span>)
<span class="hljs-keyword">except</span> ValueError:
    print(<span class="hljs-string">"Please enter a valid integer."</span>)
</code></pre>
<p>I remember the first time I divided by zero during practice. Instead of a scary “ZeroDivisionError” traceback, my program now politely said: <em>“Oops! You can’t divide by zero.”</em> That moment felt empowering — like I had just added safety rails to my code.</p>
<hr />
<h3 id="heading-task-5-virtual-environments">Task 5: Virtual Environments</h3>
<p><strong>What I Learnt:</strong> Every project deserves its own bubble.</p>
<pre><code class="lang-python">python -m venv myenv
myenv\Scripts\activate    
pip install numpy
pip list
</code></pre>
<p>At first, I was annoyed , why couldn’t I just install everything globally? But then I understood. Virtual environments are like “separate kitchens” for each project. You don’t want to cook jollof rice in the same pot you used for egusi soup. Keeping dependencies isolated saves so much future headache.</p>
<hr />
<h2 id="heading-fun-project-library-management-system-upgraded">Fun Project: Library Management System (Upgraded)</h2>
<p>By combining <strong>OOP (Week 2)</strong> and <strong>Modules (Week 3)</strong>, I upgraded my Library Management System.</p>
<ul>
<li><p>Books and members were saved in a JSON file — so data persisted even after the program stopped.</p>
</li>
<li><p>I used <code>datetime</code> to record when a book was borrowed and calculate due dates.</p>
</li>
<li><p>Error handling prevented users from borrowing unavailable books.</p>
</li>
</ul>
<p>Suddenly, my project wasn’t just a code exercise — it felt like the foundation of a real-world app.</p>
<hr />
<h2 id="heading-challenges-i-faced-and-how-i-solved-them">Challenges I Faced (And How I Solved Them)</h2>
<p>This week wasn’t smooth. Honestly, it was frustrating at times. But each struggle shaped my understanding.</p>
<ol>
<li><p><strong>The “Import Confusion” Trap</strong><br /> I often mixed up <code>import module</code> and <code>from module import function</code>. For example, I’d write <code>sqrt(144)</code> without <code>math.</code> and wonder why Python complained. My solution? I slowed down and mapped it in my notes: <em>“import keeps the namespace intact; from imports directly into yours.”</em></p>
</li>
<li><p><strong>The JSON Gibberish Moment</strong><br /> The first time I wrote a dictionary to a file, I opened it and saw gibberish. I thought my code was broken. Turns out, I hadn’t converted it properly with <code>json.dumps()</code>. Lesson: computers need structure, not assumptions.</p>
</li>
<li><p><strong>Virtual Environment Chaos</strong><br /> Activating my virtual environment on Windows gave me endless errors. I typed commands exactly as tutorials showed , but nothing worked. The problem? I hadn’t run my terminal in the right directory. Once I understood the file paths, everything clicked.</p>
</li>
<li><p><strong>Catching Everything (Too Much Error Handling)</strong><br /> In the beginning, my <code>try/except</code> was too broad. I caught every possible error, which made debugging impossible. Later, I learned to target specific exceptions like <code>ValueError</code> or <code>ZeroDivisionError</code>. It was like switching from a giant fishing net to a precise hook.</p>
</li>
</ol>
<p>Each challenge taught me something deeper than syntax. It taught me <em>how to think as a programmer</em>: slow down, test, debug logically, and trust the process.</p>
<hr />
<h2 id="heading-reflection-amp-key-takeaways">Reflection &amp; Key Takeaways</h2>
<p>This week shifted how I view programming.</p>
<ul>
<li><p><strong>Modules are the real magic.</strong> They extend Python beyond imagination.</p>
</li>
<li><p><strong>Errors aren’t enemies.</strong> They’re guides pointing me to what I don’t yet understand.</p>
</li>
<li><p><strong>Practice beats theory.</strong> Reading about JSON didn’t help until I actually tried saving and loading data.</p>
</li>
<li><p><strong>Integration matters.</strong> Week 2’s OOP + Week 3’s modules gave me my first real taste of building something “bigger.”</p>
</li>
</ul>
<p>Most importantly, I realised programming is not about memorising , it’s about learning how to <em>use the right tool at the right time</em>.</p>
<hr />
<p>✨ That’s a wrap for Week 3!</p>
<p>At the end of this week, I’ve realised that <strong>programming isn’t about how much code you can write, but how smartly you can use what already exists.</strong> Python’s modules taught me that efficiency often comes from <em>standing on the shoulders of giants</em> ; using tools that others have perfected so I can focus on solving my own unique problems.</p>
<p>Week 2 gave me the foundation of building structures with OOP, and Week 3 handed me the toolbox of modules to bring those structures to life. Together, they’ve reshaped the way I see coding: not as a set of scary terms, but as a craft where ideas, tools, and persistence come together to create something meaningful.</p>
<p>And honestly? That’s when coding stops being intimidating and starts becoming fun.</p>
<hr />
]]></content:encoded></item><item><title><![CDATA[Grasp Python OOP Effortlessly: My Breakthrough Moment]]></title><description><![CDATA[💡 “Classes, objects, inheritance, polymorphism, scope, and iterators — these are the pillars of Object-Oriented Programming (OOP) in Python. At first glance, they may seem intimidating, but once broken into small, practical steps, they become powerf...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/grasp-python-oop-effortlessly-my-breakthrough-moment</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/grasp-python-oop-effortlessly-my-breakthrough-moment</guid><category><![CDATA[#Python #DataScience  #LearningInPublic  #Dataraflow]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Sun, 21 Sep 2025 00:27:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758413440100/9b49e30a-3407-41f6-996a-43761a986869.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>💡 <strong><em>“Classes, objects, inheritance, polymorphism, scope, and iterators — these are the pillars of Object-Oriented Programming (OOP) in Python. At first glance, they may seem intimidating, but once broken into small, practical steps, they become powerful tools for writing clean, reusable, and structured code. In this post, I’ll share exactly how I learned these concepts, the code that made each of them click, and the project that brought them all together.”</em></strong></p>
<hr />
<h2 id="heading-clear-cut-learning-objectives-week-2">Clear-cut learning objectives— Week 2</h2>
<p>At the glance of this week’s module OOP , I was super intrigued about the concept, more than just theory, there were clear cut objectives tailored to make me :</p>
<ul>
<li><p>Understand <strong>the building blocks of Python OOP</strong>.</p>
</li>
<li><p>Solve practical <strong>core tasks</strong> to reinforce learning.</p>
</li>
<li><p>Attempt <strong>mini-projects and conceptual questions</strong> for deeper mastery.</p>
</li>
<li><p>Finally, bring it all together in a <strong>fun project</strong> — a Library Management System.</p>
</li>
</ul>
<p>This publication documents <strong>my Week 2 journey</strong>: what I studied, the tasks I solved, the challenges I faced, and my reflections.</p>
<hr />
<h2 id="heading-the-how-theory-code-and-approach">The How: Theory, Code, and Approach</h2>
<hr />
<p>Week 2, wasn’t just about memorizing Python concepts — It gave <strong>understanding to them in a way that felt real and practical</strong>. There was combination of theory, small coding exercises, and full projects. Each concept wasn’t just a definition, but something I tried to bring alive with code and practice.</p>
<h3 id="heading-object-oriented-programming-oop-basics">Object-Oriented Programming (OOP) Basics</h3>
<p>I came to see <strong>classes</strong> as blueprints, and <strong>objects</strong> as the actual things built from those blueprints. For example, a <code>Car</code> class is like the idea of a car, but my actual Toyota or Honda is the object.</p>
<ul>
<li><p><strong>Attributes</strong> felt like an object’s “identity card” — they describe its properties (like colour, name, brand).</p>
</li>
<li><p><strong>Methods</strong> became the actions the object could perform, like <code>drive()</code> or <code>stop()</code>.</p>
</li>
</ul>
<p>This helped me stop seeing code as just commands and start viewing it as <strong>living objects that interact</strong>.</p>
<hr />
<h3 id="heading-inheritance-in-python">Inheritance in Python</h3>
<p>This concept clicked for me when I realised it’s just like family traits. A <strong>child class</strong> inherits features from its <strong>parent class</strong> but can also have its own unique traits.</p>
<p>For example, a <code>Dog</code> class and a <code>Cat</code> class can both inherit from <code>Animal</code> but still make their own unique sounds.</p>
<p>It made me appreciate how <strong>inheritance saves time</strong>, avoids rewriting the same code, and keeps everything neat and structured — just like organising files into folders.</p>
<hr />
<h3 id="heading-scope-and-encapsulation">Scope and Encapsulation</h3>
<p>This was tricky for me at first, because I kept mixing up which variable belonged where.</p>
<ul>
<li><p><strong>Scope</strong> taught me to respect boundaries:</p>
<ul>
<li><p>Local (inside a function),</p>
</li>
<li><p>Global (outside everything),</p>
</li>
<li><p>Non-local (inside nested functions).</p>
</li>
</ul>
</li>
<li><p><strong>Encapsulation</strong> showed me that not every detail should be exposed. By using <strong>private (</strong><code>__var</code>) and <strong>protected (</strong><code>_var</code>) attributes, I learned how to “hide” sensitive parts of a class.</p>
</li>
</ul>
<p>Getter and setter methods then gave me a controlled way to <strong>safely access or change those hidden details</strong>. It was like locking my valuables in a drawer and giving out a key only when necessary.</p>
<hr />
<h3 id="heading-iterators-in-python">Iterators in Python</h3>
<p>At first, I just saw loops as something Python magically did. But iterators made me realize what’s happening under the hood.</p>
<ul>
<li><p>They rely on two methods: <code>__iter__()</code> and <code>__next__()</code>.</p>
</li>
<li><p>They are what make <strong>for loops work</strong> and allow us to create custom looping behavior.</p>
</li>
</ul>
<p>The coolest part was creating my own iterator, like a countdown or even an infinite even-number generator. It showed me how flexible Python really is once you understand the mechanism.</p>
<hr />
<h3 id="heading-polymorphism">Polymorphism</h3>
<p>This one made me smile because it felt like the most “real-world” of them all.</p>
<p>Polymorphism means <strong>different classes can have the same method name, but each behaves differently</strong>.<br />For example:</p>
<ul>
<li><p>A <code>Car</code> might have a <code>.move()</code> method that prints “Car drives”.</p>
</li>
<li><p>A <code>Bicycle</code> might also have <code>.move()</code>, but it prints “Bicycle pedals”.</p>
</li>
</ul>
<p>The method name is the same, but the action depends on the object calling it. To me, it showed that <strong>code doesn’t have to be rigid</strong> — it can adapt to the situation, just like people do in different contexts.</p>
<hr />
<h2 id="heading-practical-application-tasks-amp-solutions">Practical Application — Tasks &amp; Solutions</h2>
<hr />
<h3 id="heading-1-core-tasks-oop-inheritance-scope-iterators-polymorphism">1. Core Tasks (OOP, Inheritance, Scope, Iterators, Polymorphism)</h3>
<h4 id="heading-oop-basics-car-class-example">OOP Basics – Car Class Example</h4>
<p><strong>What I Learnt</strong>: Classes are blueprints, objects are instances, and <code>__init__</code> initializes them.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Car</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, brand, model</span>):</span>
        self.brand = brand
        self.model = model

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">details</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">f"<span class="hljs-subst">{self.brand}</span> <span class="hljs-subst">{self.model}</span>"</span>

car1 = Car(<span class="hljs-string">"Toyota"</span>, <span class="hljs-string">"Corolla"</span>)
car2 = Car(<span class="hljs-string">"Honda"</span>, <span class="hljs-string">"Civic"</span>)

print(car1.details())
print(car2.details())
</code></pre>
<p>✅ Output:</p>
<pre><code class="lang-python">Toyota Corolla
Honda Civic
</code></pre>
<hr />
<h4 id="heading-inheritance-animal-example">Inheritance – Animal Example</h4>
<p><strong>What I Learnt</strong>: A child class can override its parent’s behavior.</p>
<pre><code class="lang-python"><span class="hljs-comment">#parent class</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">animal</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, name,sound</span>):</span>
        self.name=name
        self.sound=sound
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_sound</span> (<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">print</span> (<span class="hljs-string">f"<span class="hljs-subst">{self.name}</span> makes a sound !"</span>)
<span class="hljs-comment"># child class </span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">dog</span> (<span class="hljs-params">animal</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_sound</span> (<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">print</span> (<span class="hljs-string">f"<span class="hljs-subst">{self.name}</span> says  <span class="hljs-subst">{self.sound}</span> !"</span> )

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">cat</span> (<span class="hljs-params">animal</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_sound</span> (<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">print</span> (<span class="hljs-string">f"<span class="hljs-subst">{self.name}</span> says <span class="hljs-subst">{ self.sound}</span> !"</span> )
dogs= dog(<span class="hljs-string">'Ariel'</span>, <span class="hljs-string">'woof'</span>) <span class="hljs-comment">#creating objects</span>
cats= cat(<span class="hljs-string">'Sophie'</span>, <span class="hljs-string">'Meow'</span>)

dogs.make_sound() <span class="hljs-comment"># calling their methods</span>
cats.make_sound()
</code></pre>
<p>✅ Output:</p>
<pre><code class="lang-python">Ariel says  woof !
Sophie says Meow !
</code></pre>
<hr />
<h4 id="heading-scope-example">Scope Example</h4>
<p><strong>What I Learnt</strong>: Variables may look the same but behave differently depending on scope.</p>
<pre><code class="lang-python">x = <span class="hljs-number">300</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">myfunc</span>():</span>
    x = <span class="hljs-number">200</span>
myfunc()
print(x)  <span class="hljs-comment"># prints 300 (global unchanged)</span>
</code></pre>
<p>With <code>global</code>:</p>
<pre><code class="lang-python">x = <span class="hljs-number">300</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">myfunc</span>():</span>
    <span class="hljs-keyword">global</span> x
    x = <span class="hljs-number">200</span>
myfunc()
print(x)  <span class="hljs-comment"># prints 200 (global changed)</span>
</code></pre>
<hr />
<h4 id="heading-iterators-countdown-example">Iterators – CountDown Example</h4>
<p><strong>What I Learnt</strong>: Both <code>__iter__()</code> and <code>__next__()</code> are needed to build custom loops.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CountDown</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, n</span>):</span>
        self.n = n

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__iter__</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__next__</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">if</span> self.n &lt;= <span class="hljs-number">0</span>:
            <span class="hljs-keyword">raise</span> StopIteration
        current = self.n
        self.n -= <span class="hljs-number">1</span>
        <span class="hljs-keyword">return</span> current

<span class="hljs-keyword">for</span> num <span class="hljs-keyword">in</span> CountDown(<span class="hljs-number">5</span>):
    print(num)
</code></pre>
<p>✅ Output:</p>
<pre><code class="lang-python"><span class="hljs-number">5</span>
<span class="hljs-number">4</span>
<span class="hljs-number">3</span>
<span class="hljs-number">2</span>
<span class="hljs-number">1</span>
</code></pre>
<hr />
<h4 id="heading-polymorphism-car-amp-bicycle">Polymorphism – Car &amp; Bicycle</h4>
<p><strong>What I Learnt</strong>: Polymorphism gives flexibility — same method name, different results.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Car</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">move</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Car drives"</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Bicycle</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">move</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Bicycle pedals"</span>

vehicles = [Car(), Bicycle()]
<span class="hljs-keyword">for</span> v <span class="hljs-keyword">in</span> vehicles:
    print(v.move())
</code></pre>
<p>✅ Output:</p>
<pre><code class="lang-python">Car drives
Bicycle pedals
</code></pre>
<hr />
<h3 id="heading-2-mini-projects">2. Mini-Projects</h3>
<ul>
<li><p><strong>Counter class</strong> → Counted object creations.</p>
</li>
<li><p><strong>Shape hierarchy</strong> → Practiced inheritance with <code>Rectangle</code> and <code>Circle</code>.</p>
</li>
<li><p><strong>BankAccount system</strong> → Explored methods, subclasses, and data encapsulation.</p>
</li>
</ul>
<hr />
<h3 id="heading-3-conceptual-assessments">3. Conceptual Assessments</h3>
<p>I tested myself with theory questions:</p>
<ul>
<li><p>Difference between <strong>attributes and methods</strong>.</p>
</li>
<li><p>Role of <code>__init__</code> and <code>__repr__</code>.</p>
</li>
<li><p>Encapsulation’s role in OOP.</p>
</li>
<li><p>Python’s <strong>scope rules</strong> (LEGB rule).</p>
</li>
</ul>
<hr />
<h2 id="heading-fun-project-library-management-system">Fun Project — Library Management System</h2>
<hr />
<h3 id="heading-bringing-oop-concepts-together">Bringing OOP Concepts Together</h3>
<p>This was the highlight of my week. I built a <strong>Library Management System</strong> that combined everything I learned:</p>
<ul>
<li><p>Classes for <code>Book</code>, <code>EBook</code>, <code>PrintedBook</code>, <code>Member</code>, <code>StudentMember</code>, <code>TeacherMember</code>.</p>
</li>
<li><p>Inheritance, polymorphism, encapsulation, and iterators in action.</p>
</li>
</ul>
<hr />
<h3 id="heading-key-features-implemented">Key Features Implemented</h3>
<ul>
<li><p><strong>Encapsulation</strong>: Protected <code>_borrowed_books</code>, private <code>__library_name</code>.</p>
</li>
<li><p><strong>Inheritance</strong>: <code>EBook</code> and <code>PrintedBook</code> extend <code>Book</code>.</p>
</li>
<li><p><strong>Polymorphism</strong>: Students vs teachers had different borrowing limits.</p>
</li>
<li><p><strong>Iterators</strong>: Allowed iteration over books and borrowed items.</p>
</li>
</ul>
<hr />
<h3 id="heading-lessons-from-the-project">Lessons from the Project</h3>
<ul>
<li><p>OOP concepts don’t exist in isolation.</p>
</li>
<li><p>They come together naturally in real-world problems.</p>
</li>
<li><p>The project made me appreciate <strong>why OOP matters</strong>.</p>
</li>
</ul>
<hr />
<h2 id="heading-challenges-i-faced">Challenges I Faced</h2>
<p>Learning OOP was not smooth — I hit roadblocks. But each mistake taught me something valuable.</p>
<h3 id="heading-1-confusion-between-init-and-repr">1. Confusion Between <code>__init__</code> and <code>__repr__</code></h3>
<ul>
<li><p>I thought both were for printing.</p>
</li>
<li><p>Later I realized: <code>__init__</code> initializes, while <code>__repr__</code> represents.</p>
</li>
</ul>
<p>✅ <strong>Solution</strong>: Practiced small examples until clear.</p>
<hr />
<h3 id="heading-2-scope-and-global-keyword-errors">2. Scope and Global Keyword Errors</h3>
<ul>
<li><p>My variables didn’t update globally.</p>
</li>
<li><p>I was shadowing variables inside functions.</p>
</li>
</ul>
<p>✅ <strong>Solution</strong>: Experimented with <code>global</code> and <code>nonlocal</code> until I understood how they worked.</p>
<hr />
<h3 id="heading-3-iterators-confusion">3. Iterators Confusion</h3>
<ul>
<li><p>Got errors like:</p>
<pre><code class="lang-python">  TypeError: <span class="hljs-string">'CountDown'</span> object <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> iterable
</code></pre>
</li>
<li><p>Because I forgot <code>__iter__()</code>.</p>
</li>
</ul>
<p>✅ <strong>Solution</strong>: Learned both <code>__iter__()</code> and <code>__next__()</code> are required.</p>
<hr />
<h3 id="heading-4-inheritance-and-mro-method-resolution-order">4. Inheritance and MRO (Method Resolution Order)</h3>
<ul>
<li><p>With multiple inheritance (<code>Duck(Flyer, Swimmer)</code>), I didn’t understand why <code>super()</code> only called one parent.</p>
</li>
<li><p>Printing <code>Duck.__mro__</code> confused me.</p>
</li>
</ul>
<p>✅ <strong>Solution</strong>: Learned that Python resolves methods <strong>left to right</strong> in the MRO chain.</p>
<hr />
<hr />
<h2 id="heading-how-i-overcame-my-challenges-summary">How I Overcame my challenges-summary</h2>
<ul>
<li><p><strong>Simplicity</strong>: Broke big problems into smaller steps.</p>
</li>
<li><p><strong>Study Hours</strong>: Spent several hours reading books and watching YouTube to grasp key concepts.</p>
</li>
<li><p><strong>Understanding first</strong>: I explained concepts in plain English before coding.</p>
</li>
<li><p><strong>Practice</strong>: Wrote, tested, and rewrote code until it clicked.</p>
</li>
</ul>
<hr />
<h2 id="heading-my-perspective-after-week-2">My Perspective After Week 2</h2>
<ul>
<li><p>I now think in <strong>classes and objects</strong>, not just functions.</p>
</li>
<li><p>I understand how OOP principles connect to <strong>real systems</strong>.</p>
</li>
<li><p>I feel more confident tackling new problems.</p>
</li>
</ul>
<hr />
<h2 id="heading-my-key-takeaways">My Key Takeaways</h2>
<ul>
<li><p><strong>OOP organizes code</strong> and makes it reusable.</p>
</li>
<li><p><strong>Encapsulation protects data</strong> and ensures safety.</p>
</li>
<li><p><strong>Polymorphism and Iterators add flexibility</strong>.</p>
</li>
<li><p><strong>Projects tie everything together</strong> and show the bigger picture.</p>
</li>
</ul>
<hr />
<p>✨ This is my <strong>Week 2 OOP journey</strong> — This mix of <strong>theory, code, and practice</strong> helped me not just understand the words, but really <strong>see how Python’s OOP concepts come alive</strong>. from learning the basics to insightful applications. excited for the journey ahead</p>
<hr />
<h2 id="heading-helpful-resources"><strong>Helpful Resources</strong></h2>
<ul>
<li><p><strong><em>Books :</em></strong> <em>Python programming Bible 2024 (3 in 1), python all in one for dummies</em></p>
</li>
<li><p><strong><em>peer- driven learning:</em></strong> <em>Had a session with my group mate where we brainstormed and asked question</em></p>
</li>
<li><p><strong>weekly-class</strong>: live session that explained the week’s module better and answered question <a target="_blank" href="https://youtu.be/1dadlcXFNKY?si=EBFYp3f8YDLAs7Ie">https://youtu.be/1dadlcXFNKY?si=EBFYp3f8YDLAs7Ie</a></p>
</li>
<li><p><strong>Youtube:</strong> <a target="_blank" href="https://youtu.be/wUSDVGivd-8?si=d05zHtoyNTABnIj1">https://youtu.be/wUSDVGivd-8?si=d05zHtoyNTABnIj1</a></p>
</li>
</ul>
<hr />
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758410981643/31092ed1-2092-4fec-9d32-2d2e04a53e70.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[A Step-by-Step Look at My Data Science Journey]]></title><description><![CDATA[🌱 Introduction
Foundation is the backbone of any structure built to last.Dataraflow thought it expedient to lay that foundation — from the very basics of data science, to setting up a community that thrives through peer-driven learning (luckily for ...]]></description><link>https://ogunyemi-ezekiel-timilehin.hashnode.dev/a-step-by-step-look-at-my-data-science-journey</link><guid isPermaLink="true">https://ogunyemi-ezekiel-timilehin.hashnode.dev/a-step-by-step-look-at-my-data-science-journey</guid><category><![CDATA[#Python #DataScience  #LearningInPublic  #Dataraflow]]></category><dc:creator><![CDATA[OGUNYEMI EZEKIEL TIMILEHIN]]></dc:creator><pubDate>Fri, 12 Sep 2025 23:53:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757718481927/a6ab2ac9-0d45-40b7-900f-b4efbc7d4a0c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">🌱 <em>Introduction</em></h2>
<p>Foundation is the backbone of any structure built to last.<br />Dataraflow thought it expedient to lay that foundation — from the very basics of data science, to setting up a community that thrives through peer-driven learning (luckily for me, I'm in Group 6 — Data Raiders), and eventually kick-starting the lessons with the same tone: <em>THE BASICS</em>.</p>
<p>We started with a Python crash course — from data types all the way to real applications.<br />From writing print("Hello World") to building simple projects, this first week has been quite a ride.</p>
<p>This article is my honest reflection on what I learned, the struggles I faced, how I overcame them, and my perspective on coding as a beginner.</p>
<hr />
<h2 id="heading-what-i-learned">🛠 <em>What I Learned</em></h2>
<h3 id="heading-1-python-basics"><strong>1. Python Basics</strong></h3>
<p>At first, I learned the building blocks:</p>
<ul>
<li><p>Data Types</p>
</li>
<li><p>Variables (storing information)</p>
</li>
<li><p>Arithmetic operations like sum, difference, and product</p>
</li>
</ul>
<p>For example, learning how to take two numbers and find their sum, difference, and product gave me confidence that I could actually “instruct” the computer.</p>
<hr />
<h3 id="heading-2-control-flow-if-else-loops"><strong>2. Control Flow (if-else, loops)</strong></h3>
<p>Next, I explored decision-making in Python:</p>
<ul>
<li><p>Writing conditions with if, elif, and else</p>
</li>
<li><p>Using loops (for and while) to repeat tasks</p>
</li>
</ul>
<p>A good example was the grading system program.<br />I created a program that accepts marks for 5 subjects, calculates the average, and assigns a grade (A, B, C, or F).<br />This exercise helped me see how conditions control the flow of logic.</p>
<hr />
<h3 id="heading-3-data-structures-lists-amp-dictionaries"><strong>3. Data Structures (Lists &amp; Dictionaries)</strong></h3>
<p>One of the most powerful things I discovered was how to organise data:</p>
<ul>
<li><p><em>Lists</em> helped me store multiple values, like marks.</p>
</li>
<li><p><em>Dictionaries</em> were useful when I built a shopping cart program, where I stored item names and their prices.</p>
</li>
</ul>
<p>These were my “aha!” moments — when I realised coding is really about structuring information.</p>
<hr />
<h3 id="heading-4-fun-projects"><strong>4. Fun Projects</strong></h3>
<p>Some of my favourite projects were small but exciting:</p>
<ul>
<li><p><em>Fibonacci Sequence Generator</em> → taught me logic and loops</p>
<pre><code class="lang-python">  a,b = <span class="hljs-number">0</span>,<span class="hljs-number">1</span>
  fibonacci=[a,b]
  <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range (<span class="hljs-number">8</span>):
      c=a+b
      fibonacci.append(c)
      a,b=b,c
  <span class="hljs-keyword">print</span> (<span class="hljs-string">f" The first 10 numbers of the Fibonacci sequence are <span class="hljs-subst">{fibonacci}</span> "</span>)
</code></pre>
</li>
<li><p><em>Guessing Game</em> → made coding interactive, with hints like “too high” or “too low”</p>
<pre><code class="lang-python">  <span class="hljs-comment">#1 computer randomly picks a number between 1 and 20</span>
  <span class="hljs-keyword">import</span> random
  secret_number = random.randint(<span class="hljs-number">1</span>,<span class="hljs-number">20</span>)
  print(<span class="hljs-string">'welcome to the guessing game'</span>)
  print(<span class="hljs-string">'I have picked a number between 1 and 20. can you guess it?'</span>)

  <span class="hljs-comment">#2 Loops until user guesses correctly</span>
  <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
      guess = int(input(<span class="hljs-string">'enter random number (1-20)'</span>))
      <span class="hljs-keyword">if</span> guess &lt; secret_number:
          print(<span class="hljs-string">'Too low! Try again.'</span>)
      <span class="hljs-keyword">elif</span> guess &gt; secret_number:
          print(<span class="hljs-string">'Too high! Try again.'</span>)  
      <span class="hljs-keyword">else</span>:
          <span class="hljs-keyword">print</span> (<span class="hljs-string">f'Got it! the number was <span class="hljs-subst">{secret_number}</span>'</span>)
          <span class="hljs-keyword">break</span>
</code></pre>
</li>
</ul>
<hr />
<h3 id="heading-challenges-i-faced">⚡ <em>Challenges I Faced</em></h3>
<ul>
<li><p>"<em>Before this week, the code i’d often write was SQL</em> — where indentation, case, and spacing didn’t really matter. Then came Python… and suddenly I was in a whole new world!"</p>
</li>
<li><p><em>Errors everywhere 😅</em> — missing colons, wrong indentation, or mis-typed variables.<br />  At first, they frustrated me, but later I realised errors are simply feedback.</p>
</li>
<li><p><em>Understanding loops and conditions</em> — I often got wrong results because I didn’t fully grasp the logic.<br />  Practising with small examples helped me fix this.</p>
</li>
</ul>
<hr />
<h3 id="heading-my-perspective-after-week-one">🎯 My Perspective After Week One</h3>
<p>Learning data science is not just about writing code — it’s about developing <em>problem-solving skills</em> and learning to think logically.</p>
<ul>
<li><p>Errors are part of the process — not a sign of failure</p>
</li>
<li><p>Every bug I fix makes me better at programming</p>
</li>
<li><p>Consistency matters — even writing a few lines of code daily keeps the momentum alive</p>
</li>
<li><p>Real application is where true learning happens — the tasks, take-home exercises, and assignments showed me just how important these basics are</p>
</li>
<li><p>Projects (no matter how small) make learning fun and practical</p>
</li>
</ul>
<hr />
<h3 id="heading-how-i-overcame-the-struggles"><strong>🛠 <em>How I Overcame the Struggles</em></strong></h3>
<p>✅ <em>Practice, Practice, Practice</em> — I wrote the same code multiple times until I could explain it in my own words.<br />✅ <em>Debugging by Printing</em> — I used print() to check what my variables were doing at each step.<br />✅ <em>Asking for Help</em> — I didn’t struggle in silence. I asked questions, watched videos, read resource material, and got explanations that taught me the why behind the code.</p>
<hr />
<h3 id="heading-conclusion"><strong>📝 <em>Conclusion</em></strong></h3>
<p>Today, I may not be an expert, but I can confidently say I understand the core concepts of Python — and I can keep building from here.</p>
<p>This is just the beginning of my story here at <em>Dataraflow</em>, and I’m excited for what’s ahead 🚀.</p>
]]></content:encoded></item></channel></rss>