Data normalization and why it matters

Article Published by Geotab
Author: Melanie Serr, Senior Content Editor

How do you take information from cars, trucks, buses, of many different makes and models from all around the world and present it in a format that is easy to work with? Enter a little discussed, yet important part of telematics, called data normalization. This post takes you behind the scenes to explore this fascinating process.

What is data normalization and why is it important?

Data normalization is the process of cleaning, filtering and standardizing data. Why is this process so significant? Without it, fleet management would be much more complicated and time-consuming. Fleets would need to go through multiple sources to try to access information from their fleet vehicles. Furthermore, the data available might be limited in some cases, depending on the vehicle type or manufacturer, and also in different formats. 

Vehicle data is complex

Every vehicle manufacturer, and on down to the vehicle model, has its own set of codes, data formatting, order of data and so on. A single make/model could even change multiple codes and definitions in just one model year.

There is a mixture of connection types among vehicles, such as CAN-BUS found on trucks and large buses, compared to the OBD II port typically found on light-duty vehicles. International and mixed fleets will be familiar with this issue. There are also specific connection types and protocols governing equipment like generators and fork lifts.

Within those protocols, there is an evolution of protocols, such as the change from SAE J1708 to SAE J1939. There are still many heavy duty vehicles internationally using J1708, so Geotab must be able to translate those protocols to make sense compared to their newer counterparts in SAE J1939 protocol.

See alsoTelematics Glossary: 60+ Terms to Know

Geotab translates vehicle data for fleet use 

Geotab acts as one giant data language library and translator for the fleet industry. We provide access to data from almost every make and model, car, van, truck, including electric vehicles (EVs) and Plug-in Hybrid electric vehicles (PHEVs). 

With our experience in telematics, Geotab has developed a way to access all these data language protocols and translate it so that fleets can accurately use and compare the data. Since the auto and truck industry is constantly evolving, we stay on top of changes and update our system with each new model added to it. We do this through what we call Geotab Status Data IDs. 

Why does it matter? How data normalization impacts fleets

Geotab users don’t see this process, but what they do see is seamless access to all their fleet vehicles in one telematics platform. For example, a fleet manager can create a fleet rule for speeding or idling that apply to all vehicles, without having to create different rules for every type of protocol language. This also applies to fleet reporting and setting up alerts.    

No matter how the data is formatted, when the fleet manager accesses MyGeotab, all the different language protocols have been reverse engineered into one single Geotab language.

Comparing Vehicle A to Vehicle B is now possible without the user needing to think about any of this.

In cases where data normalization is not happening, fleets would find it difficult to directly and quickly compare Vehicle A to Vehicle B, and in general would find very little use for vehicle data. It might even be necessary to hire more people just to do data entry and analysis. 

As competition and the pressure to reduce costs increases, having the ability to access and analyze your fleet intelligence becomes ever more vital. In other words, data normalization, while it’s not highly visible process, is nevertheless critical to helping fleets achieve their goals for increasing driver safety and improving efficiency, sustainability and profitability.

About the author: Melanie Serr is a Senior Content Editor for Geotab with an eye on fleet safety and all things tech. Follow her on Twitter at @mel_serr.


How the Curve Algorithm for GPS Logging Works