Text this: Integration of spatial information across vision and language/