A simple changelog of the Import.io product for 2016.
##19 December 2016
[improvement] Upgraded database
Upgraded one of our databases to improve the performance and security of the app.
[improvement] Upgraded the Editor to use HTTPS
Now, the whole application works under HTTPS.
[bug fix] Unable to add more columns to an extractor
Fixed a problem that prevented from adding more columns to an extractor beyond a certain number.
[bug fix] URL Generator produces bad urls when user selects parameters in the main domain
Fixed a problem with the URL Generator.
##18 December 2016
[bug fix] When pasting a URL on the new Extractor modal the Go button remains disabled
Fixed a problem where the Go button remained disabled when pasting a URL on the New Extractor modal.
##16 December 2016
[bug fix] Multiple crawl runs running in parallel
Fixed a problem that made multiple crawl runs to run in parallel.
[improvement] New status message after stopping a crawl run
Added an improved message while the crawl run is stopping
[improvement] Improved query counter message for overages
Improved the counter counter for a cleaner message when users run into overages.
[bug fix] Dashboard refresh makes it hard to see the preview
Fixed a problem that prevented the preview to be displayed correctly.
[bug fix] Download button is not available after stopping a crawl run
Fixed a problem that prevented the download button to display after stopping a crawl run.
[bug fix] Request premium trial twice
Fixed a problem that allowed users to request a premium trial multiple times.
[improvement] Improved Live Query API backend
Improvements on the Live Query API backend.
[bug fix] Incorrect count for “No data extracted” URLs
Fixed a problem that incorrectly reported URLs that were successfully rendered but no data was extracted.
[bug fix] Preview only shows one row
Fixed a problem where the preview only showed one row for some URLs.
[improvement] Several backend improvements
General improvements on our backend to increase the success rate of crawl runs.
##15 December 2016
[bug fix] Problem with downgrading on the new subscription page
Fixed a problem that prevented users from downgrading from the Billing and Subscription page.
[bug fix] Typo on the reset password form
Fixed a typo on the reset password form
##9 December 2016
[improvement] Crawl run time counter extended
Added support for days on the crawl run time counter.
##6 December 2016
[improvement] Monthly query limit warning is now dismissible
You can now dismiss the monthly query limit from the Dashboard.
[improvement] Improved error messages
Improved error messages for unexpected errors.
[bug fix] Delete an extractor while running causes an inconsistent state
Fixed a bug that allowed to delete an extractor while running.
[new feature] Crawl runs now show successful and failed URLs count
Crawl runs have now the information for both the successful and failed URL counts.
[improvement] Clicking on a link now opens on a new tab
Clicking on a link now opens on a new tab
[bug fix] Null value is generated when empty URL’s parameters are user
Fixed a bug with the URL Generator when using empty values
[bug fix] Wrong URLs are generated when “step” field is populated with double values
Fixed a bug with the URL Generator when using double values on a step.
[bug fix] Run history show display 30 crawl runs
Fixed a bug that prevented the 30 crawl runs to display correctly.
[bug fix] URL count tooltip does not display correctly
Fixed a bug that prevented the URL count tooltip to display correctly.
##2 December 2016
[new feature] Timestamp on crawlruns
Crawl runs have now the time information for when the crawl run started.
[bug fix] Query counter does not refresh
Fixed a bug that prevented the query counter from refreshing when the crawl run completed.
##1 December 2016
[improvement] Free users can start a crawl run even if they don’t have enough queries
Free users can choose to start a crawl run and use all their queries even if they don’t have enough to run thorugh all the URLs
[bug fix] Scheduled extractor did not run
Fixed a bug that prevented scheduled extractors from running when scheduled.
##30 November 2016
[improvement] Query reset counter
Improved query counter now displays the overages on a separate counter.
[improvement] Data preview improvements
Data preview is now available as soon as the run has only 20 rows of data.
[new feature] Hourly schedules
You can now schedule your extractor on an hourly basis.
[improvement] Extended extractors limit
Up to 1,500 extractors are now visible on the product.
[improvement] General improvements on the URL Generator
The URL Generator has now a smarter engine with better use of memory and CPU, makes the UI more responsive when generating large numbers of URLs.
[bug fix] Encoded values on the URL Generator
Fixed a bug where some characters where encoded by the URL Generator.
[bug fix] Dashboard crashes in Safari
Fixed a bug that prevented the app from working properly in Safari
[bug fix] Cannot logout
Fixed a bug that prevented from loggin out from the app
##29 November 2016
[improvement] New My Account page
General improvement on the onboarding form.
[bug fix] Cannot reset password
Fixed a bug that prevented users from resetting their passwords.
##28 November 2016
[new feature] New My Account page
A new My Account page now contains all your account with subscription and billing information.
[bug fix] Add URLs loses focus when adding a URL
Fixed a bug that made it hard to add new URLs on the Editor
[bug fix] Wrong redirect when page render fails
Fixed a bug that re-directed the user to the wrong place when the page could not be rendered.
[new feature] Enable/disable Get link URL for columns
A new menu option on the columns allows to control if the url link should be included or not.
##4 November 2016
[bug fix] API Key resets itself
Fixed a bug that made the API Key to reset itself randomly.
[improvement] Improved crawl run history UI
Moved to icons on the crawl run history for cleaner Dashboard.
[bug fix] Subscription query renewal dates are not in sync with billing dates
Fixed a bug were the query renewal dates where out sync with the billing renewal dates.
##3 November 2016
[bug fix] Data not displayed on the Website view when using XPath
Fixed a bug that prevented data from being displayed on the Website view when using XPath.
[bug fix] Wrong number of rows on my extractor runs
Fixed a bug with extractor runs showing an incorrect number of rows.
##31 October 2016
[bug fix] Cannot login using email with new domain extensions
Fixed a bug that prevented users from creating new accounts using an email address with a new domain extension.
[improvement] "Improved page user menu
We made some minor improvements to the user menu.
##28 October 2016
[bug fix] Cannot login using email with new domain extensions
Fixed a bug that prevented from creating new accounts using an email address with a new domain extension.
[improvement] "Improved page rendering times
We updated our backend systems to improve the time it takes to render a page, reducing both in training and extraction times.
##26 October 2016
[new feature] "Elapsed and estimated remaining time on Runs
You can now see how much time is elapsed and an estimate on how much time is left for your Run to complete.
[improvement] "Duplicating an extractors no longer runs it by default
When you duplicate an extractor it no longer starts running by default.
##19 October 2016
[bug fix] Cancel fails for runs with a very large number of URLs
Fixed a bug that prevented the cancel function to work properly for Runs with a very large number of URLs.
[bug fix] "Download not available for successful scheduled runs with a large number of URLs
Fixed a bug that prevented the download function to work properly for scheduled Runs with a very large number of URLs
##16 October 2016
[bug fix] Preview & Log files links are available when no files are available
Fixed a bug that enabled preview and log files links in some cases where no data was available for them.
[bug fix] “You are over your limit” banner wrongly displayed
Fixed a bug that made the “You are over your limit” banner display when it shouldn’t.
[improvement] “You are over your limit” banner wrongly displayed
Onboarding login flow was reduced to a single screen to screamline the process.
##15 October 2016
[bug fix] Using links created via XPath or Regex no longer breaks chained extractors
Using url links created via XPath or Regex caused the chaining functionality to fail. The issue has been fixed.
##10 October 2016
[bug fix] Using manual XPath no longer breaks Data View
Using manual XPath made the Data View to stop working properly in some cases. The issue has been fixed.
##5 October 2016
[improvement] Improved scheduled runs logic
If a scheduled run was stuck for any reason the following schedules would not run. We improved the logic for run monitoring to make sure runs don’t get stuck any more.
##29 September 2016
[bug fix] URL containing spaces now work
URLs containing spaces in their fragment (after the # sign) used to cause problems, in particular around the URL generator. The issue has been fixed.
##19 September 2016
[new feature] Notification when account suspended
In the (rare) even that an account is suspended, the user will receive a notification at the next login.
##13 September 2016
[new feature] API access to JSON file
The API now exposes the JSON for the latest run in addition to the CSV version. See the Integrate tab for your extractor.
[bug fix] URL generator deal properly with complex URLs
The URL generator now deals properly with URLs that contain a fragment, which is the part of the URL after a # sign.
##12 September 2016
[new quota] Modified the number of queries
We adjusted the number of queries for Trial and Free plans.
##7 September 2016
[bug fix] Fixed bug affecting simple chaining runs
We fixed a bug that preventing simple chaining runs to start.
[improvement] Better retry
When retrying a page we do not use the version we have cached.
##6 September 2016
[bug fix] Restarting page rendering
We fixed a bug that caused the page rendering service to fail and not restart.
##5 September 2016
[bug fix] Some schedules runs got stuck
We fixed a bug that sometimes caused scheduled runs to get stuck.
[improvement] Better error reporting
We split out the errors from the Webcache in the new crawler.
[bug fix] Fixed a site that could not load
##20 August 2016
[bug fix] CSV and JSON failing to load
We fixed a bug that prevented caused some large jobs to fail: even though the job succeeded, the CSV and JSON files failed to show as available for download. All fixed now!
##19 August 2016
[bug fix] Simple chaining failing to start
We fixed an issue that prevented some Simple Chaining jobs to start.
##18 August 2016
[bug fix] Correct number of queries and renewal date
We fixed a bug that caused the renewal date and the number of queries left in a plan to show up incorrectly.
##8 August 2016
[improvement] Better extractor names
New extractors now have better default names, for example we remove the initial “www.”.
[new feature] Ability to download URLs
When you need to copy all the URLs in the URL box to some other tool, we now provide the ability to download them as a simple text file. So even if you generate tens of thousands of URLs with the URL generator, you’ll get them all with one click.
[improvement] Removing link to dashboard in app
For those of you using the app, we have removed the link to the dashboard which was confusing.
[improvement] Better search for extractor names
We now provide full-text search on all extractor names, so typing “ik” will match “www.ikea.com/…”.
##28 July 2016
[bug fix] Lastet CSV API now handles files larger than 6MB
Resolved a bug where files larger than 6MB were producing the error
[body size is too long] via the API. The fix means some small programmatic adjustments to be made depending on how you are calling the API. See this article for details.
##21 July 2016
[improved navigation] Link back to to Dashboard from My Data
For users who were with us before April 2016 who have the Desktop app, there is now a link to switch back to the Web Extractor Dashboard in the top right hand corder of your My Data page. See here.
NB: The Web Extractor Dashboard does not work in the desktop app browser.
##12 July 2016
[new feature] Simple Chaining
We have released a new way to feed an extractor: point to an existing extractor and chose one of its columns containing URLs. This makes it easier to “chain” extractors, where the output of one extractor provides the list of URLs for the next one.
Learn to use Simple Chaining with this tutorial.
##11 July 2016
[new feature] Shortcuts for regular expressions
Regular expressions are very powerful, but not everyone is comfortable writing them. So we provide a number of shortcuts for common types like number, currency, URL… Try it and feel the power!
[new feature] CSV from last run
In the Integrate tab for an extractor, it is now possible to get a URL that will download the CSV for the last successful run of this extractor.
[new feature] Google Sheets integration
In the same location, one can also obtain a formula that can be pasted into Google Sheets to fetch the data from the last successful run of the extractor.
##7 July 2016
[improvement] New URL Generator
We are proud to introduce a new URL Generator. It makes it possible to specify any part of a URL as a parameter. Rumor has it that it is now possible to generate hundreds of thousands of URLs without any the dashboard becoming unresponsive.
Learn to use Simple Chaining with this tutorial.
[improvement] More space for the extractor name
Feel free to give your extractors meaningful names!
[new feature] Scheduling icon
Scheduled extractors now display a small “clock” icon. Mouse over to see when the next run is scheduled.
##29 June 2016
[improvement] Crawl infrastructure update
Many stability improvements including tweaks to memory usage, log rotation, improved deployment behaviour and autoscaling.
##22 June 2016
[new feature] Extracting “alt text” for an image
We have added a feature that automatically extracts the “Alt text” for an image. This is useful when you want to get hold of data that is displayed in a visual format e.g. star ratings on Trip Advisor. The “alt text” is made available as an additional field / column in the JSON / CSV.
[bug fix] CSV download issues
Fixes a number of bugs that were causing issues with the CSV output (including missing Source URL column and row sorting issues)
##17 June 2016
[bug fix] CSV data was mixed up, now fixed
In some cases when a column had only a few pictures, the data could get misaligned among columns. Fixed.
##9 June 2016
[new feature] Duplicate extractors
Extractors can now be copied. This makes it easy to copy and modify one of your existing Extractors, or even copy someone else’s Extractor into your own Dashboard.
[new feature] RSS feed
RSS feed of your latest Extractor runs is now available via the Integrate tab inside Dashboard. RSS is useful, for example, for Zapier integration.
[new feature] HTML output support
Advanced Web Extractor option: output raw HTML of the selected elements.
[new feature] Default column value support
Advanced Web Extractor option: set a default value to use instead of outputting a blank text string when a column value is empty.
[new feature] Required column value support
Advanced Web Extractor option: exclude a row from your dataset if a particular column value is empty.
[improvement] Blocking overlay removal
Web Extractor now allows you to select elements previously blocked by website overlays.
[improvement] Better Run history limit messaging
More informative messaging to clearly indicate the free plan limit on the number of items available inside Run history.
[bug fix] Save button disabled for empty dataset
Fixed an issue to do with the Web Extractor Save button not being disabled when a dataset is empty.
##8 June 2016
[improvement] Signup process
We improved the signup screens and added the ability to specify a country code for phone numbers.
[bug fix] Resetting password
We fixed an issue that prevented some users from resetting their password.
##7 June 2016
[improvement] Improved handling of groups
We have improved the handling of multiple groups in the extraction response.
[improvement] Improved handling of errors cases for logs
Improved handling of error cases for logs with bad urls, all errors now reported.
[bug fix] Correct row count
We fixed a bug where newline characters would cause an incorrect row count.
##6 June 2016
[improvement] Major update to our crawling infrastructure
Codenamed “Bees”, it uses a swarm of machines to handle your jobs a lot quicker, and will retry automatically.
[new feature] Results now include source URL and are sorted
Nicer data: we now include the source URL as a separate column in the csv file, making it easier to collate and compare different datasets. The results are also sorted in the same order as the URLs in the original list.
[improvement] Better Web extractor
The Web extractor is now more efficient, and more resilient to small changes in a page.
[update] Improved information capture during sign up process
We have added the ability to capture more information about users upon signup so that we can better personalize the user experience.
##1 June 2016
[new feature] CSS Toggle
Added the ability to disable CSS on the page to increase options to extract data. Useful when the data you need is not visible on the page. Tutorial page available.
[improvement] URL Generator list instant refresh
The generated URL list now refreshes on-th-fly as users edit any of the parameter inputs inside the URL Generator. This provides instant feedback on what the final URL list is going to contain. Previously, users would have to click off an input before they could see the updated list.
##31 May 2016
[improvement] Better website rendering
Various improvements made to the Web Extractor ensuring that websites are displayed as accurately as possible. These include:
- better support for lazy-loading images
- strategy to ensure complete loading of website content
- improved rendering speed
- better error handling and logging
##27 May 2016
[bug fix] Incomplete list of Extractors
Fixed an issue where previously only the latest 20 Extractors were displayed in the Dashboard.
##26 May 2016
[improvement] Quick tutorial: URL Generator
Link to a brief tutorial page on how to use the URL Generator now available via the Dashboard.
[improvement] Improved navigation within the app
E.g. “import.io” logos now links to appropriate home locations, depending on whether user is logged in or logged out.
##19 May 2016
[new feature] URL Generator
Generate all the URLs you want to run the Web Extractor on with just one click. Based on the initial URL you created the Web Extractor with, “URL Generator” lets you easily tweak the parameters to create a list of URLs to run. This is particularly useful for defining page ranges, category lists, and also many other search scenarios.
[improvement] Web Extractor UI improvements
E.g., “Done” button now available in both website and data views.
[bug fix] Column navigation when there are no overflowing columns in the website view
Fixed an issue to do with redundant column navigation when there are no overflowing columns in the Web Extractor website view.
[bug fix] Incorrect plan cancellation data confirmation message
Fixed an issue to do with plan cancellation data confirmation message inside Dashboard.
##17 May 2016
[improvement] Better ad-blocking and analytics tracking-blocking support
Quicker extraction and better Web Extractor experience for end-users. Also, site-owners can be confident that Import.io traffic will have no impact on their analytics.
Various JS-related improvements made to the Web Extractor ensuring that websites are displayed as accurately as possible, so that users can always select the right data.
##06 May 2016
[improvement] Clearing URL input upon successful addition
URL input automatically cleared once a URL has been successfully added within the “ Add or manage URLs” modal when user creates/edits an Extractor.
[improvement] Better Extractor UI loading messages
More relevant Extractor UI loading messages being shown, e.g. when user edits an existing Extractor.
[bug fix] Error pasting list of URLs from spreadsheet
Fixed issues to do with URLs copied from spreadsheet (Google Sheets, Excel, etc) being corrupted within the URLs box inside Dashboard.
[bug fix] Manual XPath not picking up attribute values
Fixed an issue involving Manual XPath not picking up attribute values (e.g. /@href) in an Extractor.
[bug fix] Browser back button not working properly inside Dashboard
Fixed issues to do with the use of browser back button leading to slightly broken UI state inside the Dashboard.