This past week I have found myself reading hundreds of company descriptions that we have scraped from their company websites. Every website is different. Each company wants to showcase their brand and show a complete and polished version of what they are. This seems great! Companies shine and websites are not all cookie-cutter. However, when trying to automate the process, I have found that not all descriptions are created equal and that websites are not always deliberate with how they treat their description fields.
Currently, we are grabbing three different places in a website’s HTML in hopes of finding the company’s description then sorting through them and picking the “best” one. We look at jsonld, meta description, and open graph to find our descriptions.
What makes a “best” company description?
The first thing you think of when writing a company description is summarizing your company in a simple and effective way. You want to make sure the heart of your company is portrayed in one to three sentences.
Unfortunately, when sifting through hundreds of websites, I can’t look at each company and pick the best summarization by hand from their website or what we have scraped. Instead, I had to come up with arbitrary rules for what “best” means that would result in an accurate description for as many companies as possible.
The first thing I had to think about was how many descriptions did we get from a website in the first place. We are looking in three places so the answer in my case is zero to three. Getting no descriptions happened in 42 out of the 100 websites I used as test cases. 45 only had one description, and of those with more very few were consistent across locations. Considering how important being able to simply summarize your business is, these are scary numbers.
In the cases where there is only one option I still needed to make sure it was a good summary. To do this I counted the words in the description. If there were less than five I just threw it out. Odds are it was a mistake or a tag-line, either would make a poor description.
When there were two or three available descriptions I was forced to make a hierarchy on what description we wanted the most. This is what I landed on:
- A description that is two sentences long
- A description that is three sentences long
- A description that is one sentences long
- The shortest available
If two descriptions had the same number of sentences the one with the greater number of characters was chosen.
Originally, the description filter simply took the longest available description. This led to problems as from some websites we received short essays describing the company instead of a summary. Best for us is two sentences then three then one because in my mind two sentences is the ideal length. Short enough to have all the needed information but not too long as to have promotions or other irrelevant information. It would have been easier to pick one sentence length over everything else and value succinctness however, most of the time one sentence isn’t enough.
The final step is a character limit. We decided that 256 characters should be the limit for descriptions. For any that were longer we removed the last sentence until it was short enough or – if it was only one sentence long – we would cut it off and add “…” to the end. For reference, the following paragraph is 256 characters long.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras fringilla, lectus quis eleifend laoreet, enim purus aliquet ex, sed accumsan arcu dui nec leo. Pellentesque maximus in elit dignissim iaculis. Sed vitae odio velit. Nulla rhoncus nisl efficitur.
A Few Things to Think About for Your Company
Someone at some point will be trying to scrape your website. Whether that is me trying to find a good description or someone else, these scrapers help you get noticed, listed, and represented across the web. What they can access will make an impact on how they portray your company. When making your website you have control over how easy it is for scrapers to get data from you. Accessibility and consistency are the main things to think about. Can someone get the data they want? What data do you want scrapers to have access to? Where is each type of data found and is it consistent across those locations? Don’t limit these questions just to your main website. Is your company’s description the same on Twitter, Facebook, and your website? Do you have the same logo in each of these locations? Accessibility and consistency will help you get the best version of your company represented across the internet.