Software Names are Unstructured Datasets
Working with unstructured datasets is always a challenge. It is hard to know what you're comparing or where to look for certain information. Nearly as bad are semi-structured data: data that pretend to be organized but don't adhere to any rules or structure. It's hard to review and hard to rely on.
This is the situation we find ourselves in with software data. Software publishers, titles, and versions are not always as consistent as you would like them to be. There is no required standard for the publisher, name, or version the publishers enter into the registry when the software is installed. This makes for rather inconsistent data.
Publishers are rarely consistent with the name they use. For example, we often see Microsoft in many forms: "Microsoft," "Microsoft Corporation," "Microsoft Corp." Further, sometimes software is delivered by a packager such as Citrix who registers their own name.
Software names are further complicated by editions, suites, and sometimes by including the version number itself in the name field. While this is fine for having complete information on any single installation, it is hard to use when trying to get a summary of all installations in an environment. For example, "Microsoft Office 2010," "Microsoft Office Professional Plus 2010," and "Microsoft Office Professional 2010" each appear in my test environment.
Software versions also experience their own lack of consistency. While we can often find a major-minor-patch structure (Semantic Versioning), that is not always the case. Often letters and other information such as years and editions makes its way into the version field.
Some example data from my test environment:
|Microsoft Office 2010||Delivered by Citrix||1.0.0|
|Microsoft Office Professional Plus 2010||Microsoft Corporation||14.0.6029|
|Microsoft Office Professional 2010||Microsoft Corporation||15.0.4420a|
|Microsoft Office 2003 Web Components||Microsoft Corporation||12.0.4518|
|Microsoft Office Professional Editie 2003||Microsoft Corp||11.0.8173|
|Microsoft Office Basic Edition 2003||Microsoft Corporation||12.0.4518.345|
|Microsoft Office Standard Edition 2003||Microsoft||11.0.5614|
We've heard customers remark for quite some time that they struggle to reliably report on their software and struggle to really understand what is installed where. A reliable software inventory is needed for licensing, compliance, and security. We want to make software data more useful, we are making strides to standardize and structure the software data.
Machine Learning to the Rescue
To start, we have translated, parsed, and established rules over many iterations to give us a start with structuring the data. We added a "release" field to keep track of releases and editions such as "2010" or "Professional." Next, we reviewed thousands of software titles to make corrections as we deemed appropriate. Using this structured data, we have begun to inform a machine learning model that is getting better with each iteration. We have some additional model training to do in the coming months.
Customers in our Early Access program can see these preliminary results and see the direction we're heading in. We'd love to hear your feedback on it! As the data improve, we are also going to be bringing software features to the fore. The following features are now available for all cloud sites:
- A new Software tab has been added to the main navigation to let you view all your software with ease.
- Search for a particular software or publisher using the search box in the upper right.
- View Details of a software title, including the versions and scanned titles by clicking on the software in the All Software list or filtered view.
- View standardized software on asset pages is now also available.
- Build a report using the standardized software data.
- The next software feature in the roadmap: Seeing the assets where a software title is installed from the software detail page.
Give Us Your Feedback
We would love for you to check out your standardized software information on the Lansweeper Cloud platform and to let us know what you think. We plan to continue to iterate on these features and want to make it as useful to you as can be.
As always, we welcome any feedback you might have and we're happy to give you a helping hand if you run into issues. Have an idea of what we should add? Finding inconsistencies? Why not let us know on our feedback platform.
Discovery Gerdes from Lansweeper