Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History back function does not reload article from ZIM in Firefox (regression) #366

Closed
Jaifroid opened this issue May 4, 2018 · 26 comments
Assignees
Labels
Milestone

Comments

@Jaifroid
Copy link
Member

Jaifroid commented May 4, 2018

This is a regression from #354. In Firefox ESR and Firefox Quantum, pressing the "Back" button to go back in the browsing history -- which calls the history.back() function -- replaces only the iframe's state back to the point at which we last called document.write(). In the released version 2.3 of Kiwix JS, Firefox correctly causes the required page to be re-read from the ZIM, resetting the state of the outer document as well. Since we now have an iframe.onload event, in the current version of Kiwix JS the parseAnchors() and other functions are called without voiding the iframe, which attempts to attach onclick events to all links again, even though onclick events are still somehow attached to the document referencing the previous page. This results in a mess (links to internal pages do not function, etc.).

Curiously this only happens with Firefox, not with Edge. Firefox appears unique in maintaining a history state for the iframe window, whereas Edge and Chromium do not, so they correctly reset the state of the outer document as well on history back. See https://stackoverflow.com/questions/3254985/back-and-forward-buttons-in-an-iframe and in particular this:

in Firefox around v42, sometimes the browser's own "back" button results in per-frame history.back(), rather than a top-frame history.back(), for no apparent reason (= inconsistent behavior of the native "back" button)

It seems we need to intercept history.back() and ensure it is applied to the top frame, if that is possible. If that isn't possible, then we will need to intercept the iframe.onload event, derive the articleContent's title and path from the HTML, as it doesn't seem to be available in the URL (from initial testing) and manually reload the article.

It is not enough merely to void the iframe.onload() event with a blank function (I have tested this): the article still remains in an inconsistent state, because the iframe hasn't been voided correctly. The inconsistent state, whereby event listeners for the previous article are still attached to elements in the DOM of the new article looks like a Firefox bug to me, related to its per-frame history. Or it could be a jQuery bug related to #361 (jQuery keeps hooks on the old article even after the new one is loaded).

@Jaifroid Jaifroid added the bug label May 4, 2018
@Jaifroid Jaifroid added this to the v2.3 milestone May 4, 2018
@Jaifroid Jaifroid self-assigned this May 4, 2018
@Jaifroid
Copy link
Member Author

Jaifroid commented May 4, 2018

Ha, @mossroy , the solution is laughably simple! It turns out that doing history.go(-2); instead of history.back(); causes Firefox to reload the full page of the previous article, and re-extract it from the ZIM. The conjecture that Firefox is keeping a snapshot of the iframe and adding it in as an extra piece of history seems to be confirmed. Of course now we have to decide how we can tell that one browser needs history.back(); and the other history.go(-2); . Maybe we can do something with history.pushState(); or maybe we can detect an intermediate history.popState happening at some point. I have no experience at all with the window.history object in html5.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 4, 2018

One other thought, @mossroy : it is possible to detect that the URL and the html loaded into the iframe do not correspond (because pushBrowserHistoryState() has not been called with a single Firefox history.back(), whereas it will be called in other browsers that are reloading the article from the ZIM). This gives us a non-browser-specific way of detecting, in the iframe.onload event, whether we need to go back further in the history, I think. If you have any thoughts on this, let me know.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 5, 2018

Possibly a better solution than manipulating window.history is suggested here:

https://stackoverflow.com/questions/19624452/workaround-to-firefox-creating-new-history-after-each-document-write

It does seem to have an effect on first trials.

EDIT: Here is the documentation on it: https://developer.mozilla.org/en-US/docs/Web/API/Document/open

See:

If you dont want to create a history entry, replace open() with open("text/html", "replace").

Jaifroid added a commit that referenced this issue May 5, 2018
@Jaifroid
Copy link
Member Author

Jaifroid commented May 5, 2018

Hi @mossroy , the "Simple fix" 3b228c9 referenced above is the minimal change needed to fix this issue, based on the MDN-suggested fix referenced in the previous comment (you may not have received an email copy of that comment).

However, I'd also recommend we do the "Suggested reorganization" 6972425. This rationalizes where we push a new history state, and also allows a bit more flexibility in what we push to the service worker and what we push to the jQuery routines.

Please note that I haven't tested the "Suggested reorganization" with Service Worker mode. If you like the idea of the suggested reorganization, I'd ask you to test if it works with Service Worker.

I've tested both of the above fixes fairly thoroughly in Edge, Firefox ESR, Firefox Quantum, IE11 and FFOS. I haven't tested in Chrome. Let me know which of these alternatives you'd like me to do a PR for.

@mossroy
Copy link
Contributor

mossroy commented May 6, 2018

I agree with both the simple fix and with the suggested reorganization. BUT I think we should first test if we cannot replace the document.write() by a jQuery-way to do it (as suggested in #361 )

Jaifroid added a commit that referenced this issue May 9, 2018
Jaifroid added a commit that referenced this issue May 9, 2018
Fixes #366 and possibly fixes #361 (subject to testing)
@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

[This continues discussion of how to load the iframe from #361, since this fix most obviously fixes the history.back issue that is currently in master, and incidentally may also fix the TypeError issue, but that needs further testing.]

So, we have good news! Commit 95fc2c3 on the Fix-history-back-regression branch reverts the way of loading the iframe to the use of a dummy article (just called article.html). With this fix, Firefox now correctly understands the iframe to have been loaded with the file:// protocol, and also respects and uses the base href of any new article injected into the iframe.

The procedure is as follows:

  1. Void the iframe by using the jQuery.empty(); function, as this ensures jQuery is aware that we are about to get rid of the iframe and avoids memory leaks (it's possible this step isn't necessary, but given issues reported in a number of places with jQuery causing memory leaks if we don't use jQuery methods to void nodes, I don't feel confident in removing this now);
  2. Reload the dummy article with iframe.src.= "article.html";
  3. Wait for the iframe.onload event;
  4. Set the new article's html using iframe.innerHTML.

This solves #366 , but I cannot easily test whether it solves #361 as I cannot reproduce that exact error. The code works reliably in Edge, Firefox, Internet Explorer and FFOS. I have not tested in Chrome.

Some observations:

  • This method uses innerHTML. We have already discovered that innerHTML may have CSP issues with Chrome extensions. However, looking at the guidance here (search for innerHTML on that page), it should not be an issue so long as it is made clear to the reviewer that the source of the HTML is local and controlled, and not from the Web;
  • The way I've written it would allow an easy revert to the cross-browser document.write(); should we have CSP problems.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

One further quick observation:

  • For some reason, the "client-js" class attached to the <html> tag of recent mobile-style ZIMs is being stripped when we inject the article's HTML into the iframe using .innerHTML

I don't understand why. I have inserted breakpoints in the code immediately prior to injection and immediately after injection. It is clearly there in htmlArticle before injection, and it is clearly gone when we inspect articleContent.documentElement.innerHTML immediately after injection. As this is consistently the case in Edge, IE and Firefox, it must be by design rather than a bug. This of course means that all headings in the document are "open". If we want clients capable of injecting JavaScript to be able to use the open-close routines, we would have to re-inject it back in to the HTML.

NB My JavaScript injection works with this code (I haven't pushed a rebase yet to git-hub, I've only done it locally). It's only because the client-js class is missing that the content in the headings does not collapse on click.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Ah, of course! It's because we're using innerHTML instead of outerHTML... The documentElement is the html node. So obvious when you see it. Is outerHTML supported on FFOS?

@mossroy
Copy link
Contributor

mossroy commented May 9, 2018

Cool! I'll try to test that.
outerHTML should be supported on FFOS, as it is supported by Firefox>=11.
I have a regression on the base tag in ServiceWorker mode that I'll try to investigate : it adds the base tag several times (like A/A/A/articleName.html) : one more on each article you browse.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Turns out that any attempt to write to an iframe contentDocument's outerHTML causes a "NoModificationAllowedError" in Edge and Firefox. We may have to inject classes from the <html> element into the <body> element instead, prior to injection of the iframe, if we want to maintain fidelity with the CSS coming from the ZIM (and if we want to enable the open-close functionality).

Hmm, that regression sounds strange...

@mossroy
Copy link
Contributor

mossroy commented May 9, 2018

I think the regression in ServiceWorker mode comes from the onload event on the iframe, that is executed on each article change, where we only want it to be executed once, to display the first article.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Hmm, that was by design, because we re-load the article.html on each new article as a way of completely voiding the iframe and its DOM (by setting iframe.src = "article.html"). Otherwise we have CSS rules and scripts hanging around on the iframe. It's true we might not have to set the onload event each time -- maybe it persists between article loads, I didn't check for that. Just assumed it would be the same as a document.write();.

@mossroy
Copy link
Contributor

mossroy commented May 9, 2018

That's true, but only in jQuery mode. In ServiceWorker mode, the function displayArticleInForm is only called for the first article to display (for the main page, or after a search, or after a click on random article). After that, refreshing the iframe is handled by the browser (that gets the content from the ServiceWorker). I suppose the same onload event is executed at this moment. I suppose removing the onload event after it's executed for the first time should be enough : I'll test that.

Regarding coding-style, it might be better to not have 2 variables articleContent (with a different scope), that do not point to the same DOM element (one with .contentDocument, the other one without it). It's usually a source of misunderstanding or bugs.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Maybe no need even to set that onload event at all in service worker, then. Just have to be sure to push the browser history, maybe it can be done outside the onload event after the iframe.src code.

Regarding re-use of the variable, I agree, but it was because the variable's pointer has become invalid at that point, so it seemed like an efficient way to reset it / void it. But maybe we should explicitly void it outside the onload event and then use a different variable name inside the event. I'm conscious of avoiding memory leaks, but maybe the browser's garbage collection will deal with it.

@mossroy
Copy link
Contributor

mossroy commented May 9, 2018

That depends if setting the src of an iframe is synchronous or asynchronous. If it's synchronous, we don't need the onload event, and can put the corresponding code after setting the src.
If it's asynchronous, it creates a race condition, and we need to keep the onload event, regardless of the jquery or serviceworker mode.
I believe it's asynchronous (it would make sense for a browser to load an iframe in the background), but am not 100% sure.
In any case, adding :
document.getElementById("articleContent").onload = function() {};
inside the onload function seems to work, at least in ServiceWorker mode

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

OK. Yes it is most definitely asynchronous! I just thought that if all you have to do is get the code into the iframe for service worker, then because the programme effectively stops at that point, you don't need to have an onload event for service worker. It will just receive the iframe when it is loaded. The .onload event doesn't then do anything further in service worker mode, it just stops, no (apart from pushing the browser history)?

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Sorry, I wasn't looking at the code. Yes, you need to load htmlArticle inside the onload function. Somehow I'd misremembered it being outside that function (because it used to be)...

@Jaifroid
Copy link
Member Author

Jaifroid commented May 9, 2018

Commit 18a75fd adds to the body element any css classes stripped from the html element due to the use of the .innerHTML injection method. It's done generically. Of course it's not as satisfactory as having the class(es) in the html element where they should belong, but the presence of CSS in the html tag can only be being used as a "switch", since it's not possible otherwise to style the html tag (there are no styles that apply to it as far as I know). So for the intended purpose, adding it to the classList of the body tag should produce the same result. There doesn't seem to be any way of adding it back to the html tag in the iframe.

Commit 5bc75d3 changes the variable names and turns off the iframe.onload function inside the iframe.onload event.

@mossroy
Copy link
Contributor

mossroy commented May 9, 2018

Mmmmh... That looks good but, in the end, is it really simpler and safer than the document.write solution? I'm wondering if the arguments I gave for this solution were not wrong. What do you think?

@Jaifroid
Copy link
Member Author

Jaifroid commented May 10, 2018

It seems to be, as we say in England, "six of one and half a dozen of the other". The innerHTML method seems to have the disadvantage of not being able to manipulate the html element. Document.write() has a certain stigma attached to it (some developers say "don't use it", others say "use it only if you know what you're doing", others say "it's the only cross-browser way of writing a document to the iframe"), and it has that difficulty with jQuery (that I believe we solve by using jQuery.empty() before we write to the iframe).

It seems the ultimate solution might be the srcdoc element (untested for us), but it's not yet fully cross-browser (and although the polyfill will work in Edge I believe, I'm not sure it will work in FFOS if it's needed there). At least we have options!

So, I've written this in a way that it should be easy more or less to "drop in" iframe.write() where we currently have iframe.src=.... I'll do a quick version for testing which seems the better option especially re. service worker.

Jaifroid added a commit that referenced this issue May 10, 2018
@Jaifroid
Copy link
Member Author

Jaifroid commented May 10, 2018

So please test commit 9dd2887 especially with Service Worker in mind. And then test to see if this line is still necessary (or maybe it's just good practice?).

If we end up squashing this commit (or similar), I think we should leave at least one intermediate step in the commit history showing how to do it with innerHTML (and so the above discussion can be retraced if necessary).

@mossroy
Copy link
Contributor

mossroy commented May 10, 2018

I've pushed a commit with some proposed refactoring. It seems to work well in ServiceWorker mode, at least on Chromium.
I had to change the way to remove jQuery cache, because empty() function was only removing the body content, keeping the body tag itself, that was still giving the "can't access dead object" exception.
It seems to fix #361. I tested on Firefox 59 and Chromium 65, not yet on other browsers. I did not test on stackexchange ZIM files, but it should work the same.

I tend to prefer this document.write solution, because, in the end, both solutions need to remove the jQuery cache, but this one does not need to add an onload event for ServiceWorker mode, and does not need to manually put back the html tag attributes (it's more generic).

I agree that it would be good to keep the "blank html file + innerHTML" solution somewhere. But if we rebase/squash, the commit ids will change anyway. Maybe another way to keep this solution would be to create another git branch right after the merge, containing only this other solution.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 10, 2018

OK, good that you found a more complete way to remove the iframe document with jQuery. That mostly looks fine.

There was a reason I included Service Worker mode in the .onload event, which is that if you want to support injection of inline scripts and inline events (JavaScript), then you'll need to do something like this:

        iframe.onload = function() {
            pushBrowserHistoryState(dirEntry.namespace + "/" + dirEntry.url);
            parseEvents();
            injectInlineScriptsJQuery(inlineJavaScripts);
        // If the ServiceWorker is not useable, we need to fallback to parse the DOM
        // to inject images, CSS etc, and replace links with javascript calls
        if (contentInjectionMode === 'jquery') {
                parseAnchorsJQuery();
                loadImagesJQuery();
                loadCSSJQuery();
                //JavaScript loading currently disabled
                //loadJavaScriptJQuery();
            }
        };

This is from the javaScript-support branch (but of course it's untested on Service Worker). If you can envisage wanting to support inline scripts in this mode, it might be worth us laying the scaffolding now.

@mossroy
Copy link
Contributor

mossroy commented May 11, 2018

Regarding inline javascript and ServiceWorkers, it might be dealt differently.
First, I'll try to convince that inline javascript should be removed in future ZIM files, because of CSP (that can exist in other contexts, too). I really think that this job (of removing inline javascript) should be done by the ZIM maker, not by the client. It would be much more efficient. And it would avoid having to parse the DOM in ServiceWorker mode (so far we did not need to do it, and it was the idea to be 100% generic).
If it's not possible, we'll think about which solutions we have.
Maybe it's time to create a PR : I'll try to test it a bit more next week.

@Jaifroid
Copy link
Member Author

Jaifroid commented May 12, 2018

So I've created a PR #368 for this. I've done it as a separate branch in order to leave the references to the experimental branches Fix-history-back-regression and fix-TypeError-tests intact above. In fact I think we only need to keep Fix-history-back-regression as it contains code for a generally functioning .innerHTML solution should we run into further problems with document.write(). It's good to have alternatives. Once this PR is merged we can delete the PR's branch and the fix-TypeError-tests, I think.

Jaifroid added a commit that referenced this issue May 21, 2018
Also reverts to innerHTML method of injecting the iframe due to incompatibilities of document.write() with Service Worker mode in some browsers. Fixes #361 , #366 , possibly #365.
Jaifroid added a commit that referenced this issue May 21, 2018
Also reverts to innerHTML method of injecting the iframe due to incompatibilities of document.write() with Service Worker mode in some browsers. Fixes #361 , #366 , possibly #365.
@Jaifroid
Copy link
Member Author

Jaifroid commented May 21, 2018

Fixed with #369 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants