Text this: Dynamic multimodal object segmentation based on natural language referring expressions and its applications