We report on an exploratory study where the first 60 seconds of the video recording of a user interaction are used to predict the user’s experienced task difficulty. This approach builds on previous work on “thin slices” of human-human behavior, and applies it to human- computer interaction. In the scenario of interacting with a photocopy machine, automated video coding showed that the Activity and Emphasis predicted 46.6% of the variance of task difficulty. This result closely follows reported results on predicting negotiation outcomes from conversational dynamics using similar variables on the speech signal.