Tips and Tricks

Here are a few tips to consider before you start using Helena. The best way to learn about using Helena is to watch tutorials, but once you've got the basics and want to start writing your own programs, it's helpful to keep these recommendations in mind.

Use a fresh Chrome profile

Here are directions for making a new Chrome profile: https://support.google.com/chrome/answer/2364824. We recommend that you make a fresh profile that you will use only for Helena tasks.

Keep programs small

To make faster, easier-to-debug Helena programs, try to keep them as short as possible. For example, if you want to scrape a list of search results, don't record yourself executing the search; do the search before you start recording, then bookmark the page of search results, then start your recording there.

Open new tabs

Always open new pages in new tabs when possible. Helena programs are faster and work better - and sometimes they only work - when new pages are opened in new tabs. So remember when clicking on links (or performing other interactions that open new pages) to use your browser's keyboard shortcut for opening in a new tab - 'command' for Mac Chrome, 'control' for others.

Interact with the page, not the browser

Helena can only observe your interactions with the webpage, not your interactions with the browser (Chrome). This means if you right-click on an element and interact with the right-click menu, Helena can't see what you're doing and can't write a program that will reproduce the actions. The one exception is that Helena can observe when you use the "Back" button. But since opening new tabs (see above) is the best practice, you shouldn't need to use the "Back" button that often! The browser vs. page issue arises when folks use the right-click menu to open a link in a new tab, rather than using the 'command' or 'control' keyboard shortcut; so remember to use the keyboard shortcut for this. This might also arise is if you're trying to download items and end up interacting with the "Save As" menu, which Helena can't record. In this case, you have a couple options: (i) Change some Chrome settings so that your materials download automatically, without requiring a "Save As" interaction (for instance, you can tell Chrome to download PDFs to the Downloads folder instead of opening them in a new tab by turning on a setting at chrome://settings/content/pdfDocuments, as discussed below). Or (ii) use Helena's keyboard shortcuts to copy the URL for the material you want to download, then use a different tool to download all the URLs you collect.

Update Chrome

Helena has only been tested on recent versions of Chrome, so you may run into problems if you try running Helena with an old version.

View output data in Google Sheets

Unfortunately Excel isn't always great at showing multi-line cells. Since Helena uses multi-line cells to preserve as much of the original structure of your scraped data as possible, you'll probably see plenty of multi-line cells in your output. Before you start worrying that your data has disappeared, try using a non-Excel viewer (such as Google Sheets) to show your data.

Adjust your Chrome settings if you want to download PDFs

Helena automation can be a nice way to download large numbers of PDFs, rather than accessing them all manually. If you want to use Helena this way, remember that by default, Chrome will open PDFs in a new tab. If you want to download them instead, you'll have to turn off Chrome's PDF viewer in the settings for your Helena-specific Chrome profile. On recent versions of chrome, just switch to the Helena-specific profile, load the URL chrome://settings/content/pdfDocuments, then turn on the "Open PDFs using a different application" setting.

Next/More interaction not working? Try a list of URLs.

Often, websites will use entirely separate URLs for each page of data you want to scrape. So, if showing Helena Next/More results doesn't work for you, consider the "Upload an Additional Table" option. Upload a CSV listing all the URLs (aka all the webpages with data you want to scrape). Make sure the CSV (i) has no header and (ii) lists the very first URL *identically* to the page on which you demonstrated your scrape! See these tips for editing tables.

Try to sort the data on a web page in different ways.

Websites, such as Psychological Science or Nature or Google Scholar, will list articles in a particular order, e.g., "Oldest First" vs " Newest First" vs "In Relevance." Try loading the pages and using Helena with all of these different options. Some might be more stable (results are in a consistent order every time you fetch data from the website) than others and/or use more consistent layouts, leading to an easier time with Helena.

Play around!

Now and then you'll run into a website feature that we haven't yet had the chance to make Helena handle. In those cases, try a few different ways of doing the task. Is there another way to navigate to the page you want? Can you use the keyboard instead of the mouse to do what you want, or the mouse instead of the keyboard? Try doing the task a few different ways and see what happens. And if Helena still isn't doing the right thing after a few different recordings, then try the last tip. :)

Play with the URL!

In the spirit of the last tip...this tip is especially pertinent if you are trying to scrape data from multi-page search results but the next/more interactions don't work the way you expect, such as getting a long list of all Psycholgical Science papers published in a given year. Sometimes the URLs have "hidden" in them the number of results to display at time. Why be constrained by the options the websites give you (often 20, 50, 100)? Just change the URL value to show more (maybe all? :)) your data at time. For example, look for the "pageSize" argument in the URL here: "https://journals.sagepub.com/action/doSearch?field1=AllField&text1=+&field2=AllField&text2=&publication[]=pssa&Ppub=&Ppub=&AfterYear=2019&BeforeYear=2019&earlycite=on&access=&ContentItemType=research-article&pageSize=139."

Ask questions

When in doubt, get in touch! Helena is still new and under development, so there are lots of bugs, but we're happy to help! We're always looking for new uses that stress our tool, so please contact us when you find problems. We'll see what we can do to fix your issue.

← Resources