Types of data

Last updated 2018 December 27

These tutorials address an older version of D3 (3.x) and will no longer be updated. See my book Interactive Data Visualization for the Web, 2nd Ed. to learn all about the current version of D3 (4.x).

D3 is extremely flexible about its input data. This topic introduces data structures commonly used with JavaScript and D3.

Variables

A variable is a datum, the smallest building block of data. The variable is the foundation of all other data structures, which are simply different configurations of variables.

If you’re new to JavaScript, know that it is a loosely typed language, meaning you don’t have to specify what type of information will be stored in a variable in advance. Many other languages, like Java (which is completely different from JavaScript!), require you to declare a variable’s type, such as int, float, boolean, or String.

//Declaring variables in Java
int number = 5;
float value = 12.3467;
boolean active = true;
String text = "Crystal clear";

JavaScript, however, automatically types a variable based on what kind of information you assign to it. (Note that '' or "" indicate string values. I prefer double quotation marks "", but some people like singles ''.)

//Declaring variables in JavaScript
var number = 5;
var value = 12.3467;
var active = true;
var text = "Crystal clear";

How boring — var, var, var, var! — yet handy, as we can declare and name variables before we even know what type of data will go into them. You can even change a variable’s type on-the-fly without JavaScript freaking out on you.

var value = 100;
value = 99.9999;
value = false;
value = "This can't possibly work.";
value = "Argh, it does work! No errorzzzz!";

Arrays

An array is a sequence of values, conveniently stored in a single variable.

Keeping track of related values in separate variables is inefficient:

var numberA = 5;
var numberB = 10;
var numberC = 15;
var numberD = 20;
var numberE = 25;

Rewritten as an array, those values are much simpler. Hard brackets [] indicate an array, while each value is separated by a comma:

var numbers = [ 5, 10, 15, 20, 25 ];

Arrays are ubiquitous in data visualization, so you should become very comfortable with them. You can access (retrieve) a value in an array by using bracket notation:

numbers[2]  //Returns 15

The numeral in the bracket refers to a corresponding position in the array. Remember, array positions begin counting at zero, so the first position is 0, the second position is 1, and so on.

numbers[0]  //Returns 5
numbers[1]  //Returns 10
numbers[2]  //Returns 15
numbers[3]  //Returns 20
numbers[4]  //Returns 25

Some people find it helpful to think of arrays in spatial terms, as though they have rows and columns, like in a spreadsheet:

 ID | Value
------------
 0  |  5
 1  |  10
 2  |  15
 3  |  20
 4  |  25

Arrays can contain any type of data, not just integers.

var percentages = [ 0.55, 0.32, 0.91 ];
var names = [ "Ernie", "Bert", "Oscar" ];

percentages[1]  //Returns 0.32
names[1]        //Returns "Bert"

What Arrays Are Made for()

Code-based data visualization would not be possible without arrays and the mighty for() loop. Together, they form a data geek’s dynamic duo. (If you do not consider yourself a “data geek,” then may I remind you that you are reading a document titled “Types of data.”)

An array organizes lots of data values in one convenient place. Then for() can quickly “loop” through every value in an array and perform some action with it — such as, express the value as a visual form. D3 often manages this looping for us, with its magical data() method, but it’s important to be able to write your own loops.

I won’t go into the mechanics of for() loops here; that’s a whole separate tutorial. But note this example, which loops through the numbers values from above.

for (var i = 0; i < numbers.length; i++) {
    console.log(numbers[i]);  //Print value to console
}

See that numbers.length? That’s the beautiful part. If numbers is ten positions long, the loop will run ten times. If it’s ten million positions long… yeah, you get it. This is what computers are good at: taking a set of instructions and executing them over and over. And this is at the heart of why data visualization can be so rewarding — you design and code the visualization system, and the system will respond appropriately, even as you feed it different data. The system’s mapping rules are consistent, even when the data are not.

Objects

Arrays are great for simple lists of values, but with more complex data sets, you’ll want to put your data into an object. For our purposes, think of a JavaScript object as a custom data structure. We use curly brackets {} to indicate an object. In between the brackets, we include indices and values. A colon : separates each index and its value, and a comma separates each index/value pair.

var fruit = {
    kind: "grape",
    color: "red",
    quantity: 12,
    tasty: true
};

To reference each value, we use dot notation, specifying the name of the index:

fruit.kind      //Returns "grape"
fruit.color     //Returns "red"
fruit.quantity  //Returns 12
fruit.tasty     //Returns true

Think of the value as “belonging” to the object. Oh, look, some fruit. “What kind of fruit is that?” you might ask. As it turns out, fruit.kind is "grape". “Are they tasty?” Oh, definitely, because fruit.tasty is true.

Objects + Arrays

You can combine these two structures to create arrays of objects, or objects of arrays, or objects of objects or, well, basically whatever structure makes sense for your data set.

Let’s say we have acquired a couple more pieces of fruit, and we want to expand our catalogue accordingly. We use hard brackets [] on the outside, to indicate an array, followed by curly brackets {} and object notation on the inside, with each object separated by a comma.

var fruits = [
    {
        kind: "grape",
        color: "red",
        quantity: 12,
        tasty: true
    },
    {
        kind: "kiwi",
        color: "brown",
        quantity: 98,
        tasty: true
    },
    {
        kind: "banana",
        color: "yellow",
        quantity: 0,
        tasty: true
    }
];

To access this data, we just follow the trail of indices down to the values we want. Remember, [] means array, and {} means object. fruits is an array, so first we use bracket notation to specify an array index:

fruits[1]

Next, each array element is an object, so just tack on a dot and an index:

fruits[1].quantity  //Returns 98

Here’s a map of how to access every value in the fruits array of objects:

fruits[0].kind      ==  "grape"
fruits[0].color     ==  "red"
fruits[0].quantity  ==  12
fruits[0].tasty     ==  true

fruits[1].kind      ==  "kiwi"
fruits[1].color     ==  "brown"
fruits[1].quantity  ==  98
fruits[1].tasty     ==  true

fruits[2].kind      ==  "banana"
fruits[2].color     ==  "yellow"
fruits[2].quantity  ==  0
fruits[2].tasty     ==  true

Yes, that’s right, we have fruits[2].quantity bananas.

JSON

At some point in your D3 career, you will encounter JavaScript Object Notation. You can read up on the details, but JSON is basically a specific syntax for organizing data as JavaScript objects. The syntax is optimized for use with JavaScript (obviously) and AJAX requests, which is why you’ll see a lot of web-based APIs that spit out data as JSON. It’s faster and easier to parse with JavaScript than XML, and of course D3 works well with it.

All that, and it doesn’t look much weirder than what we’ve already seen:

var jsonFruit = {
    "kind": "grape",
    "color": "red",
    "quantity": 12,
    "tasty": true
};

The only difference here is that our indices are now surrounded by double quotation marks "", making them string values.

GeoJSON

Just as JSON is just a formalization of existing JavaScript object syntax, GeoJSON is a formalized syntax of JSON objects, optimized for storing geodata. All GeoJSON object are JSON objects, and all JSON objects are JavaScript objects.

GeoJSON can store points in geographical space (typically as longitude/latitude coordinates), but also shapes (like lines and polygons) and other spatial features. If you have a lot of geodata, it’s worth it to parse it into GeoJSON format for best use with D3.

We’ll get into the details of GeoJSON when we talk about geomaps, but for now, just know that this is what simple GeoJSON data could look like:

var geodata = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [ 150.1282427, -24.471803 ]
            },
            "properties": {
                "type": "town"
            }
        }
    ]
};

(Confusingly, longitude is always listed before latitude. Get used to thinking in terms of lon/lat instead of lat/lon.)

Next up: Making a bar chart →

These tutorials address an older version of D3 (3.x). See my book Interactive Data Visualization for the Web, 2nd Ed. to learn all about the current version of D3 (4.x).

Download the sample code files and sign up to receive updates by email. Follow me on Twitter for other updates.

These tutorials have been generously translated to Catalan (Català) by Joan Prim, Chinese (简体中文) by Wentao Wang, French (Français) by Sylvain Kieffer, Japanese (日本語版) by Hideharu Sakai, Russian (русский) by Sergey Ivanov, and Spanish (Español) by Gabriel Coch.