Think Independently and Accept Uncertainty Help Master Projects
For this internship, I started with data research and collection projects and then moved to the industry classification project. This project is different. It not only asks for a company’s basic information but requires an industry classification using the PrivCo’s Industry Classification System. The process is more complicated and requires more insights into the company’s products and services in order to classify them into the right industry. However, independent thinking and willingness to accept uncertainty help me master this project.
First, I need to find the right URL. Only given the company’s name, I googled it and then compared the company’s address listed on the website with the address in the Excel sheet. Sometimes they were the same but sometime they were not. For those with different addresses, more individual judgement got involved: I needed to compare the town and state whether they matched; if so, the two companies were likely to be the same one as the company might change its address; if not, leaving it blank was the best and safest choice, because no information is better than wrong information, which can easily manipulate the model built later. Later, with the company’s website, I needed to identity whether it is a private company or other kinds. For most times, the company is easily identified with “inc” or “LLC” after its name. However, sometimes it is not. For those companies, some doubt would arise at the beginning when I glanced through the website as they displayed some strange features either from the pictures or displayed some unusual information that distinguished them from private companies. With those doubt, I would search more either from Google to confirm my judgment or later with 90% confidence double checked with my colleagues. At this moment, the habit of double-checking with others helps me avoid unnecessary mistakes.
Last came the most difficult part, which is to classify the company’s industry. The automation results do give some clues but the accuracy is not high. For a 50% chance, I found the classification is wrong, especially for businesses with uncommon services. Professional and Insurance Services have the highest accuracy, but those for Industrial Products and Software Industries are not. Manual identification can make some mistakes, not to mention how hard it would be for Machine Learning models, with regards to subtle differences in each business. However, independent thinking I have been training myself from coursework helps me to bore the uncertainty and enjoy this project. Some “wow” moments happened as I saw some unique businesses. Do not afraid to ask for help. I frequently asked help to build a more concrete understanding of industries and improve my work quality.