Array Indexing Methods In Java, R and Python

I’ve been working a bit more with R doing some data analysis. I keep getting hung up on the array access semantics and wanted to write them out so I could remember them.

Arrays are a very common data structure in computer science and most every language has support for them. Most intro classes describe an array as a set of boxes where you can stick stuff and refer to them by number.

For example if I was storing three points in Java I could do:

Point[] points = new Point[3];

The only syntactic support for arrays in Java is indexing, that is I can now refer to points[0], points[1] or points[2]. (Arrays are indexed from 0 because the indices were originally a shorthand for pointer arithmetic in C and computer scientists stuck with it.)

Python has syntactic support for slicing arrays (called “lists”), which is to give a subset of the elements. Some examples:

Python Java Note
points[-2] points[points.length – 2] 1
points[2:4] points.length < 2
? new Point[0]
: Arrays.copyOfRange(points, 2, min(points.length, 4))
2
points[4:2] new Point[0]
points[2:-2] points.length – 2 < 2
? new Point[0]
: Arrays.copyOfRange(points, 2, points.length – 2)
points[2:] points.length < 2
? new Point[0]
: Arrays.copyOfRange(points, 2, points.length)
points[:2] Arrays.copyOfRange(points, 0, min(points.length, 2))
  1. Will fail in both languages if attempting to index beyond the front of the list
  2. In java, if the end index was beyond the end of the array, the output would be padded with null entries

R has a slightly different array syntax. (They’re called “vectors” and they call fixed n-dimensional arrays, “arrays.” The syntax used here for R is not R’s actual vector syntax, but the behavior is consistent.)

R uses i:j as a syntactic shortcut for seq(i,j,1) arrays then can be indexed by arrays of indices.

Consider the array L = [a, b, c, d, e, f, g] defined in both languages:

Expression Python R Note
L[2] c b
L[-2] f [a, c, d, e, f, g]
1:3 Error [1, 2, 3]
L[[1,3,5]] Error [a, c, e]
L[1:3] [b, c] [a, b, c]
L[-1:3] [] Error
L[3:-1] [d, e, f] Error
L[3:1] [] [c, b, a]
L == g F [F, F, F, F, F, T]
L[[T, F]] Error [a, c, e, g]
L[[T, F, F, T, T, F, F]] Error [a, d, e]
L[L == g] Error [g]

Both python and R allow assignment to slices of arrays. The behavior is different however.

  • Python only allows assignment with iterable items and the array replaces the referenced section, expanding or contracting the original array as necessary:

    L[1:4] = [3] ⇒ [a, 3, e, f, g]

  • R replaces the referenced section with repetitions of the assignment list. It is an error to assign a list that is not a multiple of length being replaced.

    L[1:4] = [3] ≡ L[1:4] = 3 ⇒ [3, 3, 3, 3, e, f, g]

    For example, I did some calculations on an array that I wanted to then take the log of and plot. Plotting infinity (log(0)) was causing some problems, so I wanted to replace all elements that were infinity with 0. Using the array syntax, that looks like:

    data[data == -Inf] = 0

All in all, I’m liking R’s array syntax. It would be nice to be able to index from the end of the array, but the flexibility of specifying an arbitrary list of indices outweighs the convenience.

Leave a Reply

Your email address will not be published. Required fields are marked *