(set: $score = 0) Congratulations! We've succedded through epically failing at Digital History for three months! Some of us have created our digital skills toolbox, while others have developed and supplmented what they already know. We've all come across common mistakes that we've made individually and as a class. We've learned that the right tool is the one that makes the most sense for our documents. Try out this trivia game to test your toolbox of Digital History, quick recognition skills of what we've learned and your understanding of what works and what doesn't work for different types of data. [[Question 1]] You are practising writing in Markdown syntax by writing a short assignment in Markdown. You are using dillinger.io, a site that lets you see your work in Markdown next to what its output will look like. You know that you need to use footnotes for your assignment but the only solution you can find in your Programming Historian tutorial is reference-style links. You are following their instructions which say to use two square brackets with the citation and an ID number but it is not working. <img src="Question1Markdownmarkdown.png"> <img src="Question1Markdownpreview.png"> [[You have already created a footnote, you've just included too much information]] [[Markdown does not create the type of footnote you want]] (set: $score to $score +1) Correct answer! One point! To create a footnote with Markdown, you should modify the reference-style link function. Instead of giving Markdown a phrase to become a hyper link reference, all we need to do is include the ID number of our footnote and note the bibliographic information that corresponds to its ID number. <img src="q1correctanswermarkdown.png"> <img src="q1correctanswerpreview.png"> [[Question 2]] (set: $score to $score +0) Wrong! No point for you! You can create regular footnotes in Markdown, by modifying the reference-style link function. Instead of giving Markdown a phrase to become a hyper link reference, all we need to do is include the ID number of our footnote and note the bibliographic information that corresponds to its ID number. <img src="q1correctanswermarkdown.png"> <img src="q1correctanswerpreview.png"> [[Question 2]] Congratulations! Your score is: (print: $score) Next, you are working on your command line for Windows on your Windows machine. You have foolishly not made separate folders for all of your test files generated during Programming Historian tutorials. So now you're trying to count how many text files you have in a repository of your files so you can re-organize the messy respository full of different types of files. You know there are likely only a few so they should be easy for you to count. You are trying to bring up a list of all the files in the repository so you can count them. <img src="q2screenshot.png"> What are you doing wrong and/or what could make your process better? [[You should use -dir instead of -ls because the command is diffrent in Windows Command Prompt]] [[You should use -dir instead of -ls but you should also use wildcard]] [[You should be able to use -ls and then count the .txt files. Not sure why it isn't working]] Marvelous! Your score is (print: $score) You're developing your Digital History skills now and you want to take your skills to the next level. You've heard of topic modeling before and you want to give it a shot using MALLET. You're working on a project on tourism in Nova Scotia and you have 500 text files from the 2000-2015 Doers and Dreamers Nova Scotia Tour Guides. You want to confirm your hypothesis that by 2015, Nova Scotia's tourism industry has become one that is primariy concerned with promoting the region as 'modern'. [[Topic modeling cannot fully give you that answer because your question is too specific]] [[Topic modeling cannot give you that answer because you are working with too many files]] (set: $score to $score +1) One point for partial correctedness! You're partly right. Because you're using Windows, the command for a list of files is -dir isntead of -ls. But from there, counting the text files among the list of your other files is combersome. To simplfiy things, use the wildcard -dir *.txt (again rather than -ls *.txt). <img src="q2rightanswerwildcard.png"> From here you can see your text files much better. explanation and screen shot. only one because dir not ls is only half of problem [[Question 3]] (set: $score to $score +2) Correct! Two points! Using -dir to get a list of files will bring up a messy collection of various files. Counting the text files among this list of your other files is combersome. To simplfiy things, use the wildcard -dir *.txt (again rather than -ls *.txt). <img src="q2rightanswerwildcard.png"> [[Question 3]] (set: $score to $score -2) You're making things too difficult! Lose two points. Because you are using the Windows Command Prompt, -ls can not possibly work. Further, to simplfy your process, use wildcard so that you don't have to count the .txt files mixed in with other files types. <img src="q2rightanswerwildcard.png"> [[Question 3]] Good thinking! (set: $score to $score +2) Topic modeling is not a magic solution for providing a quick answer to complex research questions. In <a href="http://programminghistorian.org/lessons/topic-modeling-and-mallet" target=_blank>the Programming Historian tutorial on topic modeling</a>, topic modeling is seen less as topics and more as discourses as Ted Underwood and Lisa Rhody have argued. Topic modeling is a good place to show you where you should be focusing, but cannot answer these types of specific questions. We could use it to see how the discourse of the guide books change. But ultimately because topic modeling does not understand the meaning of the words, we cannot over-rely on it. [[Question 4]] Sorry! (set: $score to $score -2) MALLET can handle hundreds of files at a time, and in fact more will create better examples with more text files. Topic modeling is not a magic solution for providing a quick answer to complex research questions. In <a href="http://programminghistorian.org/lessons/topic-modeling-and-mallet" target=_blank>the Programming Historian tutorial on topic modeling</a>, topic modeling is seen less as topics and more as discourses as Ted Underwood and Lisa Rhody have argued. Topic modeling is a good place to show you where you should be focusing, but cannot answer these types of specific questions. We could use it to see how the discourse of the guide books change. But ultimately because topic modeling does not understand the meaning of the words, we cannot over-rely on it. [[Question 4]] Current score: (print: $score) You are interested in finding out the word frequencies of a large amount of sources that you have been periodically putting into your Zotero account. You use a code from the <a href="http://programminghistorian.org/lessons/counting-frequencies-from-zotero-items" target=_blank>Programming Historian Zotero tutorial</a> to count frequencies in your Zotero library. All of your sources contain the same fields as the following example: <img src="q4question.png"> Your code looks like this: <img src="q4code.png"> You get an error message, why? [[You cannot do this with your User Zotero account]] [[You do not have a URL attached to your sources]] Check again! (set: $score to $score -2) Your code needs a URL in your entries to work, yours do not have URLs. The tutorial and corresponding code is designed to count Zotero *HTML* items therefore, an URL needs to be associated with your sources for it to work. <img src="q4question.png"> [[Question 5]] Right! (set: $score to $score +2) The code you've used from <a href="http://programminghistorian.org/lessons/counting-frequencies-from-zotero-items" target=_blank>the Programming Historian Zotero tutorial</a> is written to count Zotero HTML items that have a URL associated with them. Because your sources do not have URLs, the code will not work. If you're using published sources, check if their publisher has a permalink you can enter. [[Question 5]] Nice work, your score is (print: $score) You have data on Nova Scotia place names from Thomas Brown's 1922 *Place-Names of the Province of Nova Scotia* from the <a href="https://archive.org/details/placenamesofprov00browuoft" target=_blank>the Internet Archive</a>. The data is in an ordered OCRed alphabetical, 172 page list. You are looking to organize it as it a .csv file. This is what the data looks like: <img src="question5.png"> What is the best tool you should use to organize it? [[Harness the power of OpenRefine]] [[Use a Word Processor with regular expression function]] Maybe not the least stress inducing choice... (set: $score to $score -2) Because you already have an OCR file where the structure is apparent, using OpenRefine is not necessary. For minor changes with a small amount of text, a Word Processor has the function that you need. Remember, the right tool is the most simple that provides the results you need, not the one that may over-complicate your task. [[Question 6]] Correct! (set: $score to $score +2) Because your data already has a latent structure, you do not need to use OpenRefine. It will likely be more simple to use a Word Processor, such as Libre Office, that has the ability to understand regular expressions. You can use search and replace to achieve some minor adjustments left over from the OCR text. [[Question 6]] Not done yet! Your score is (print: $score) You are studying campus culture at Carleton University in the 1960s. The <a href="https://archive.org/details/carletonlibrary" target=_blank>Carleton University Internet Archive collection</a> has The Raven, Carleton's yearbook, for the years 1960-1969. You want to download only the ten text files for each year and use AntConc to do text analysis to come up with some research questions. According to <a href="http://programminghistorian.org/lessons/corpus-analysis-with-antconc" target=_blank>Heather Froehlich</a>, what might be the best way for you to obtain the text files for your text analysis? [[Datamine the Internet Archive collection to scrape all of the files]] [[Copy and paste the text into a text editor]] Good thinking! Earn one point. (set: $score to $score +1) Since you will only need ten text files from the Carleton Archives Internet Archive collection, it is a good idea to create the text files on your own so you quickly and easily become acquainted with the data you will be working with. All you need to do is copy and paste the "Full text" OCR from the Internet Archive and save in a text editor. In our case, the data is not messy. All you need to do is remove the lines at the end that are advertisements and the metadata at the top. This is likely more easier to remove by hand than getting into more complicated tools like OpenRefine. While datamining the Internet Archive to download all of the files is an option here, <a href="http://programminghistorian.org/lessons/corpus-analysis-with-antconc" target=_blank>Heather Froehlich's AntConc tutorial</a> recommends working with the text files by hand IF your collection is small. [[Question 7]] Maybe not the most beneficial choice! Lose one point. (set: $score to $score -1) While datamining the Internet Archive to download all of the files is an option here, <a href="http://programminghistorian.org/lessons/corpus-analysis-with-antconc" target=_blank>Heather Froehlich''s AntConc tutorial</a> recommends working with the text files by hand IF your collection is small. Since you will only need ten text files from the Carleton Archives Internet Archive collection, it is a good idea to create the text files on your own so you quickly and easily become acquainted with the data you will be working with. [[Question 7]] Your score: (print: $score) Keeping with last question, you have entered the ten text files into AntConc. You are thinking you are interested in looking at discussions of racism and sexism on campus in the 1960s. Within AntConc, you use the 'Collocates' feature to understand the statistically likely context within with which words that begin with 'sex' are found, using the wildard feature. For race, a wildcard search doesn't bring very narrow results with 'rac*', so you try 'racism' instead. These are the results you get: <img src="Question7racism.png"> <img src="Question7sex.png"> What research leads/questions can you confidently make? [[Racism was most likely not part of campus culture discourse in the 1960s]] [[The word 'sex' is not used to describe discrimination in the 1960s, why might that be?]] Rethink your assumption. Minus two points. (set: $score to $score -2) This is a broad statment to make. Returning to http://programminghistorian.org/lessons/corpus-analysis-with-antconcHeather Froehlich's tutorial, she argues that the files we use can shape our research questions. While racism may not have been part of campus culture in the 1960s, the information we have is not enough to make that assumption. We could review our text files and possibly clean them up more, or we could compare the files to the 1970s yearbooks to get an idea of why racism as a term is not found in the 1960s yearbooks. Maybe it's not something addressed generally in Carleton yearbooks or maybe the discourses change in the 1970s. But at this stage, we cannot be sure. [[Question 8]] Good thinking, gain two points! (set: $score to $score +2) This is a reasonable research question that does not put too much faith and over-reliance into your text analysis. You can condifently say that the context of derivatitves of 'sex' is not generally associated with discrimination in the 1960s yearbooks. To further your understanding, you could make sure that your text files are perfectly cleaned. Or we could look to the 1970s yearbooks and compare that corpus with the 1960s yearbooks to see if this context changes. [[Question 8]] Good work, your current score is (print: $score) You've come across a complete collection of the earliest <a href="https://novascotia.ca/archives/Piers/default.asp" target=_blank>accession records</a> for the Nova Scotia Museum from 1900-1939 created by the first Provincial Archivist of Nova Scotia, Harry Piers. His accession ledgers have been digitized by the Nova Scotia Archives. The documents all have metadata in its structure, with the categories: Scientific Name, Common Name, Department and Phylum, Locality and When Collected, Collector, Donor, Received, No. of Specimens, and Remarks. There are nearly 9500 specimen and artifact records! For now, you want to look into just the first year of the museum's collecting in 1900. Here is an example of one his earliest entries, accession <a href="https://novascotia.ca/archives/Piers/book.asp?ID=1" target=_blank>numbers 78-92</a>. <img src="Piersaccessionnumber78.png"> You are planning on creating an OCRed version of the data and from there, you want to structure the data. Let's assume that you have a very good OCR software that can recongize neat hand-writing. What tool should you use? [[OpenRefine]] [[Regular expressions with the OCR scans]] [[Use Python to change the OCR scan's structure into 'dictionary' structure]] Sorry! Lose two points! (set: $score to $score -2) OpenRefine works with spreadsheet files such as .tsv and .csv files. Our metadata is historical metadata rather than the metadata we get from digital resources in these files formats. [[Question 9]] Sorry! Take away two points.(set: $score to $score -2) Using Regular Expressions will not help you much here because the data is not regular! For example, performing a search and replace on the OCRed text may pick up something that you actually do not want changed. This is a point that is stressed in Jon Crump's Programming Historian tutorial, <a href="http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-file" target=_blank>"Generating an Ordered Data Set from an OCR Text file"</a> [[Question 9]] Nice catch! Two points! This is the tool you would want to put your efforts into using. (set: $score to $score +2) Because it is a historical metadata source, it does not have the inherent structure that modern metadata does. You also must create an OCR version. Because the data still does not really have a structure, creating a Python dictionary creates a field for each type of metadata that we have and makes the data easier to work with in the future. It is also important to keep in mind that handwritten text is more difficult to OCR. Creating a Python dictionary with an OCR text is covered in Jon Crump's Programming Historian tutorial, <a href="http://programminghistorian.org/lessons/generating-an-ordered-data-set-from-an-OCR-text-file" target=_blank>"Generating an Ordered Data Set from an OCR Text file"</a> [[Question 9]] Congratulations! Your score is: (print: $score) Let's return to the 1922 Nova Scotia place names data from Question 5. <img src="question5.png"> After having used a word processor to give structure to the data using Regular Expressions, you're interested in quickly employing a visualization tool to your structured .csv data in some way for future presentations. Which tool should you use? [[Palladio]] [[Another data visualization tool]] You could, techincally, but it's not the best tool for you! (set: $score to $score -1) Because Palladio is a new tool, it does not have the functionality to make presentations. Rather only can be exported in static form. Also, because you need visualization in a quick time frame, a tool like Palladio will likely take up too much time because it still contains bugs! [[Final Round]] This is probably the safest bet! (set: $score to $score +1) Because Palladio is a new tool, it does not have the functionality to make presentations. Rather it can only can be exported in static form. Also, because you need visualization in a quick time frame, a tool like Palladio will likely take up too much time because it still contains bugs! You're better off using something that's been around for longer with more tools for presentations! [[Final Round]] You made it! Your score sits at (print: $score) As a final question: after months of fighting to become Digital Historians (yes! I mean you!), what's the most important key to success in digital history? [[Ask a friend in the class or outside of the class. Get beer with said friends, buy them chocolate. Eat the chocolate togther. Don't go it alone!]] [[Spend hours at home wanting to throw your computer out the window]] Of course! Two points for your ingenuity. (set: $score to $score +2) But, if all else fails, remember this: <img src="theitcrowd.png"> You shouldn't be here. Don't throw your computer out the window.