5.1 Discussions
On the whole, participants were able to engage creatively and productively with the majority of the visualisations. The responses given by both researchers and archivists indicate that they participants were able to envision how visualisations might support their work should they engage with email collections. At times, this usage was concomitant with existing practice, for instance, supporting existing activities or research questions. At others, the visualisations prompted participants to consider new perspectives on how they might engage with the data. Participant 4 regularly reflected on new areas of thought prompted by surveying the email collections through the use of visualisations, or, in some cases, by the creation of the visualisation itself. Similarly, Participants 2 and 3 noted several possibilities for integrating visualisations into the archival workflow, supplementing the catalogue, or providing a point of access for users. These findings support established thought
12 that visualisations support holistic, exploratory behaviour of data, encouraging a user to engage with existing modes of thought but also facilitating them to gain new insights and therefore, potentially, prompt new approaches and questions in individual subject areas, cross-disciplinary research or professional practice.
In terms of the impact of different levels of privacy awareness on usefulness, the findings demonstrated that although each of the PrivCon levels achieved at least one score of 6, the distribution of the other scores varied quite dramatically and yielded unexpected results. It was postulated in [
11] that ‘when considering email data from the perspective of humanities researchers, whose standard methodologies involve the close and usually manual examination of data, the scale of privacy may well be considered inversely related to the degree of useful access’. However, in this empirical approach to investigating the issue, it was demonstrated that the situation is more nuanced than that with the usefulness dependent on the underlying focus of the data and associated analysis as much as the restrictions introduced by the privacy management strategy.
As revealed in [
11], PrivCon 1—particularly anonymisation, pseudonymisation and redaction—represents the most popular privacy management strategy employed by those conducting research into email collections, through the use of visualisations. Most participants, however, viewed this redaction as removing key information (e.g. names) that was essential to their work. The sense, for most, was that simply viewing the overarching pattern made by individual data points was insufficient for detailed analysis within an arts & humanities and archival workflow context. To a degree, this might be minimised by the use of a different techniques, such as pseudonymisation, whereby participants would still be able to follow the threads of specific individuals even if that individual was not explicitly named. This option, however, is more risky in terms of the potential for re-identification (cf. [
84‐
87]). Conversely, Participant 2 acknowledged that the opportunity to redact content was beneficial to allow the wider release of email data. In line with this, Participant 4 revealed a level of anxiety regarding the amount of information available at the lower PrivCon levels, especially as it pertained to disseminating their research. This, therefore, indicated that the higher PrivCon levels might have specific purposes for the public facing side of research or practice, after the data have been surveyed and analysed without the use of a filter. In fact, this follows the pattern found in many of the studies identified as associated with PrivCon 0 datasets in [
11]. These papers would facilitate open access to the data for researchers involved (often utilising the participants’ own email collections) and then anonymise, pseudonymise and/or redact content to allow for publication of examples. In terms of active research or practice, however, not only do these approaches provide a lower level of privacy for the data subjects, but they also provide little usability for follow on work.
The results for PrivCon 2 were most strikingly contrary to the expectation of the relationship between privacy awareness and usefulness. Whilst, on the whole, not viewed as being quite as useful as PrivCon 0, visualisations in this category are well regarded by the participants. In one notable instance, the directed network graphs, the PrivCon 2, received a slightly higher score than PrivCon 0. Based on the participants’ responses and proposed usages for this privacy awareness level, it suggests that this higher level of protection concurrently offers a greater range of opportunities for researchers and practitioners to engage with email collections. By grouping data points so that the individual is hidden in a crowd, this type of visualisation offers a summary or intermediary form of analysis that can inform and inspire the user in their work. Such holistic perspectives are increasingly proving valuable within the humanities with the advent of data-driven studies such as those associated with, to name a few areas, distant reading (cf. [
95‐
97]), digital humanities (cf. [
98‐
101]) or machine learning and AI (cf. [
102‐
104]). In addition, an email collection results in a large, potentially untenable number of data points. The dataset utilised for this study, for example, was a small sample of the complete email collection (approximately 5.4%) and this, in turn, was a relatively small email collection compared to those that exist in more recent archival datasets ( [
3,
105]). Even at the scale presented in this study, participants raised concerns about the level of detail present in some of the visualisations, the network graphs in particular, suggesting that they might become unsustainable if expanded to larger experiments. The introduction of interactive elements (e.g. the ability to zoom, re-centre, include hover over information) is one solution to mitigate these issues, but these demand a higher level of technical skill on the part of the creator of the visualisation, as well as greater hardware and software requirements. The amalgamated nature of PrivCon 2 style visualisations is another possibility, and one with both a high level of usability and privacy awareness.
The final PrivCon level explored in this paper, PrivCon 3, was regularly judged to be the least useful to the participants’ work. The reasoning behind this appears to be, in the first instance, one of a knowledge gap. There were a number of instances throughout the study where participants were uncertain about engaging with the visualisations. In fact, the majority of issues arose from the level of detail and context (or lack there of) for the visualisations. The only issue where the participants consistently exhibited anxiety about their ability to comprehend the visualisation, both at Stage Two and Stage Three, was for PrivCon 3. Here participants expressed the need to more completely understand the processes underlying the generation of noise and how this might impact upon their analysis of the data.
Within these overarching patterns, there were some possible influencing factors or points requiring further investigation. There was one instance where PrivCon 1 was rated more highly than PrivCon 2 and that was in relation to the Word-Trees. This disparity from the overarching pattern is perhaps best accounted for by the removal of the reading panel for PrivCon 2. Similarly, there was evidence of anomalous results for the Mountain Graphs. Each PrivCon level for this set received very similar usefulness results from the participants. The distinction came from Participant 1 who gave a rating of 4 for PrivCon 0 and a 6 for levels 1 and 2. In principle, this is a truly intriguing result; however, when exploring their reasoning behind the score, there appears to have been some confusion given that under PrivCon 1 it is noted that ‘if it had the names/dates it would be very useful’. Each of these graphs does have the date included and the PrivCon 0 graph also has the names, but was given a lower usefulness score. Unfortunately, there is no reason given for the PrivCon 0 score. Additional investigation would be required to facilitate a more concrete analysis.