50 Top Logo Examples to Inspire and Elevate Your Brand Identity
Background on logos and 50 of the most iconic logo examples to serve as inspiration for your own logo journey.
We'd love to hear about your project. Please fill out this form to provide us with the necessary details.
Like many people these days, I’m a power user of ChatGPT and other LLMs to help me with things like drafting outlines for blog posts, generating cover letters, summarizing feedback, and much more. Generally, these tasks are long-form text-based, with the outputs being Google Docs.
Recently, I discovered an entirely new use case for LLMs I hadn’t even thought of: cleaning up messy spreadsheets. This involves data cleansing, where AI tools can be used to clean and prepare data for analysis. While I’ve always seen value in using AI for my long-form text tasks, it wasn’t until I used AI for my spreadsheet use case that I saw a 100x time savings with AI.
As with any AI/LLM task, the proper prompts and workflows are critical for getting the right output. Here’s a step-by-step guide for using ChatGPT to clean your data and enhance 100x faster.
Recently, I found myself with a large data set that was messy, inaccurate, and incomplete. It’s a database of venture capitalists and angel investors that I’m putting together as a data product here at Peak Digital Studio. I’ve been using Phantombuster to scrape LinkedIn for VCs/angels and putting them into a spreadsheet. The problem with the data set is that details are missing, specifically around location, and some results aren’t investors. I wanted to clean the list up to ensure the only entries that remain are actually investors and that we have their location information.
I’ll use this specific example to show how I took a messy, inaccurate spreadsheet and ran it through ChatGPT to clean the data, and ensure the output was valuable for the end user.
The following is a step-by-step guide for the data cleaning process using AI. Of course this specific use-case I am using as an example might not exactly fit what you’re doing, but generally the principles will apply. I used ChatGPT for this task as that’s the model I am most comfortable with, so results may vary with some of the other LLMs.
To upload files, you’ll need the ChatGPT plus plan.
Before uploading anything to ChatGPT, make sure your file is below the 100mb upload limit. If you need to remove data or split the spreadsheet in half, you can always start with a smaller file to get the initial results, and upload more spreadsheets later. Be sure your spreadsheet is in .CSV format and that the data type for each column is consistent and correctly formatted.
I also remove any extraneous columns that won’t help ChatGPT do the work I need it to.
Now that your spreadsheet is ready and you have the right plan, you can give ChatGPT a prompt. I use ChatGPT 4o for this task, but have used 3.5 in the past.
To create the prompt, first think of all the issues that are making your data unclean, and work back from that.
In my case, I had lots of names with weird characters (I didn’t actually know the name of these characters so I used Perplexity to ask what they were called and what prompt I should use in ChatGPT to remove them).
I also wanted to determine, using all available information I scraped from their LinkedIn, if the person was an actual investor (for some reason lots of people say they work in VC, but don’t actually). Finally, I wanted the data to have coherent geographic data for each entry so it could be filtered by region, country, and continent.
Here’s the prompt I landed on:
You are a bot that takes .CSV files and cleans and enhances the data. I’m going to provide you with a list of profiles I scraped from LinkedIn, and I’ll give you a few instructions for how to clean:
1. Please clean up and correct tany mojibake or character encoding issues. Convert any special characters or symbols to their standard ASCII equivalents where possible.
2. Remove any middle initials and/or qualifications from people’s names. For example, Dr. Sean T. Smith, DDS please remove the “Dr.”, “T.” and “DDS”.
2. Using all available information about each entry, determine whether they are an actual startup investor - either a VC or an angel. Create a new field called Type, and label each as VC, Angel, or Not Investor
3. Use the “linkedinJobLocation” and “location” fields to first infer their location - for example knowing someone’s entry is Miami, this tells you they are in the city of Miami, state/region of Florida, country of United States of America, and continent of North America. Create new fields called “City”, “State/Region”, “Country”, and “Continent” and generate the data in these columns. Prioritize the field “location” if there’s any conflicts.
Some things to keep in mind:
1. Missing Values - If a field is absent or lacks a value in the text, make reasonable inferences whenever possible.
2. Inconsistencies - Ensure that all similar values are consistently formatted and spelled. For example, for the “state” field, “New Mexico”, “NM”, and “nm” should all be represented as “New Mexico”.
3. Final output - purely in .CSV format
With the prompt inputted, you can now upload the actual spreadsheet. It takes a bit of time for ChatGPT to process, and it will give you some live updates on what it’s up to as it goes.
Be sure to review a few lines of the data, and then have ChatGPT make any fixes you might notice. For example, many of the entries for mine didn’t have proper location data, so I added another prompt after the original run to try and fix that:
Many of these entries have the city, but haven’t used that to infer the rest of the location data. Please fix
As you can see, I am very polite when I speak to my future robot overlord.
Despite a few tries, I still had some location issues, where it was getting the country wrong. I've found sometimes it's best to start with a blank slate, and create a new chat with simpler instructions. I gave the following prompt in a new chat:
You are a bot that infers from a city’s name what state/region, country, and continent it’s on. I’ll provide a csv file that includes information including a field called “City”. Ignore all other columns and just use the “City” column to infer the rest of the geographic information based on existing data. Create new fields for “State/Region”, “Country” and “Continent” and put the data in those fields.
I removed all fields from the previous .CSV ChatGPT generated for me, and just left in the city. This solved the problem.
Don’t get frustrated - you might have to go back and forth a few times to get this fixed. Try to give clear and easy to understand prompts.
And that’s it! Now, you should have a clean and enhanced spreadsheet with high-quality data. Hopefully, this process will save you hours of busy work.
If you’d like to see the investor database I created while writing this article, check it out here. Or, if you need help setting up your own prompts or cleaning data, get in touch with me and we can help.
“Clean data” refers to datasets free from errors, inconsistencies, and inaccuracies. It ensures that the data is accurate, complete, and formatted in a suitable way for data analysis and decision-making. Clean data is essential for producing reliable insights and making informed business decisions.
Key Characteristics of Clean Data:
Common Issues in Dirty Data:
Importance of Clean Data:
Consolidating Data from Multiple Sources:Combining data from multiple data sources is crucial to avoid errors, duplication, and inconsistency in the final dataset used for analysis.
Formatting Numeric Data:Ensuring numeric data is correctly formatted is essential for accurate analysis, including converting categorical data into numeric data or filling missing numeric data with specific values.
Uniformity in Data:Data must be converted to the same unit of measure to ensure consistency and enable accurate analysis.
Data analysis is the process of examining, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It involves defining objectives, collecting data, cleaning and organizing data, using statistical tools to analyze the data, and interpreting the results to make informed decisions.
Manually cleaning data is challenging because it is time-consuming, prone to human error, and requires a thorough understanding of the data to accurately identify and correct issues. Common methods include removing duplicates, handling missing values, correcting inaccuracies, and standardizing formats.
Exploratory Data Analysis (EDA) is an approach for summarizing and visualizing the important characteristics of a dataset, often using graphical techniques. It helps in understanding data patterns, detecting anomalies, and testing hypotheses.
Data sets are collections of related data points typically organized in a table format, consisting of rows and columns, where each column represents a variable, and each row represents a record.