This version (2017/05/27 13:44) is a draft.
Approvals: 0/1

[09:00:12] <Narigo> pmlopes, good morning - what does CTE mean in this context? :) https://github.com/vert-x3/vertx-mysql-postgresql-client/issues/25#issuecomment-143770949

[09:01:32] <pmlopes> hello, CTE stands for “Common Table Expression” which in SQL (in a simplistisc way) are expressions that use the WITH keyword

[09:02:20] <pmlopes> http://www.postgresql.org/docs/9.4/static/queries-with.html

[09:02:43] <pmlopes> https://technet.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx

[09:05:58] <Narigo> pmlopes, I guess in MySQL we only have the option to query for “last_insert_id()” to get the latest inserted id in an auto_increment table. How does the JDBC client return the id? Does it always send this extra query, even if you don't want to check the latest inserted id?

[09:08:32] <pmlopes> narigo, that specific part is driver specific but there it does not mean that they always issue a second query. As i wrote in the issue the driver could rewrite the query to append “ returning pk_column_name” which would make the query atomic, no other queries would interfere with the generated values.

[09:10:35] <Narigo> pmlopes, is that really the way the underlying jdbc drivers work? I mean, are they really changing your statement when you add a parameter to return auto_increment ids?

[09:11:05] <pmlopes> narigo, the thing gets more complicated for example when you insert 2 rows, then the last_insert_id() will only return the last one.

[09:11:32] <pmlopes> narigo i am not into the details of the jdbc drivers, but i can try to have a look

[09:11:52] <Narigo> pmlopes, that's true.. how does it work in JDBC drivers then? :)

[09:12:48] <Narigo> Yeah, let's try to see how they operate first… I doubt that they are rewriting the statements themselves

[09:14:53] <pmlopes> narigo, postgres does not use last_insert_id it rewrite the query, here for example: https://github.com/impossibl/pgjdbc-ng/blob/844be8b94b8e0491370eebcc02b0f141a3aa6a7e/src/main/java/com/impossibl/postgres/jdbc/PGSimpleStatement.java#L128

[09:15:14] <pmlopes> however that is not the official driver, let me see how the official driver does it…

[09:17:08] <Narigo> https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/jdbc3/AbstractJdbc3Statement.java#L146

[09:17:10] <Narigo> pmlopes,

[09:17:24] <pmlopes> the official also, https://github.com/pgjdbc/pgjdbc/blob/543ffd0cc012ed39654d3360c9cf816589ff3f1d/org/postgresql/jdbc3/AbstractJdbc3Statement.java#L146

[09:17:25] <Narigo> it also adds returning

[09:31:35] <Narigo> pmlopes, I guess we could do something similar in our code, without getting into the driver itself, WDYT? In PostgreSQL it's done very simple - I doubt that it works with multiple statements (just changes the statement)

[09:31:58] <Narigo> “just changes the statement in a very simple way”

[09:47:39] <Narigo> pmlopes, another thing, in JDBC, it looks like you can tell the driver if you want the auto-generated ids or not. In the vertx-jdbc-client, you always return it - is this correct?

[09:48:12] <pmlopes> yes it is always on

[09:48:54] <pmlopes> i am not aware why that decision was taken but since we need to keep API compatibility we cannot change that now

[09:49:03] <Narigo> As in https://github.com/vert-x3/vertx-sql-common/blob/master/src/main/java/io/vertx/ext/sql/SQLConnection.java we never say anything about “it returns the ids implicitly”… a client can always get the ids by “doing it manually” (SELECT last_insert_id() or … RETURNING id)

[09:54:09] <Narigo> I'm looking at the MySQL JDBC driver implementation currently and try to find out how it works on their end…

[10:04:35] <pmlopes> i've jumped the boat of mysql to psql like 5 years ago, so i'm not so fresh into its internals but i don't recall anything like “insert … returning id”

[10:20:59] <Narigo> pmlopes, so after digging deep into the MySQL driver code, I believe it always sends the last inserted id and update count in it's binary protocol (https://github.com/mysql/mysql-connector-j/blob/release/5.1/src/com/mysql/jdbc/MysqlIO.java#L3057)

[10:25:43] <Narigo> pmlopes, I guess that Mauricios MySQL driver actually returns the last inserted id, if it finds one: https://github.com/mauricio/postgresql-async/blob/2c18a425c7b1d2d87d3b6530a0c1d48bfeb192cb/mysql-async/src/main/scala/com/github/mauricio/async/db/mysql/MySQLConnection.scala#L146

[10:26:59] <pmlopes> then we need to check if it really works, because there are reports on the forum that it doesn't

[10:29:07] <Narigo> pmlopes, can you give me a link? It shouldn't work for PostgreSQL, but it might already work for MySQL then

[10:30:03] <pmlopes> https://groups.google.com/d/topic/vertx/ZFUSx3_HNsw/discussion

[10:30:13] <pmlopes> the issue was related to mysql

[10:30:58] <Narigo> Ok, I need to check that

[10:34:18] <purplefox> cescoffier: pmlopes: temporalfox: Hi folks - there are a few core PRs that I'd like to merge, all pretty simple stuff. any chance someone could take a look?

[10:34:35] <cescoffier> purplefox: I'm in a code reivew day

[10:34:41] <cescoffier> so gonna have a look

[10:34:44] <purplefox> ah good :)

[10:34:50] <purplefox> thanks

[10:35:03] <cescoffier> BTW, do you remember, at some point, the Starter class was adding . to the classpath is not set

[10:46:36] <pmlopes> purplefox, i'll also give a look

[10:47:20] <pmlopes> do you have a list? or just run over all open PRs on core?

[10:56:29] <temporalfox> purplefox hello

[10:56:42] <purplefox> hi

[10:56:57] <temporalfox> I've some info about the cluster problem

[10:57:05] <temporalfox> at least I understand the behavior

[10:57:28] <temporalfox> I'm able to fix it, but I'm not sure this is the right thing to do, but the change at least makes the problem go away

[10:57:39] <purplefox> ok

[10:57:45] <purplefox> sounds like progress

[10:58:50] <temporalfox> the problem is that the HAManager uses a distributed map for the clusterMap

[10:58:54] <temporalfox> and when two nodes are killed

[10:59:05] <temporalfox> it does not have all the info to clean the subs

[10:59:12] <temporalfox> it does have info for one node but not the other

[10:59:37] <temporalfox> I made a quick change in Hazelcast to use ReplicatedMap instead of the problem goes away

[11:00:06] <purplefox> hmm, but the distributed map should have backup > 0 so should be ok

[11:00:08] <temporalfox> I think also adding more nodes to the cluster would solve the problem as distribution would be better

[11:00:26] <temporalfox> yes but two nodes dies at the same time

[11:00:31] <temporalfox> with 3 nodes

[11:00:50] <purplefox> i see

[11:00:56] <temporalfox> like a quorum issue probably

[11:01:12] <purplefox> that makes sense

[11:01:37] <purplefox> so , yes, replicatedmap seems to make sense here

[11:01:47] <temporalfox> perhaps we can use only replicated map for this particular map

[11:02:08] <purplefox> agreed, we don't want to do this in general

[11:02:10] <temporalfox> by checking the map name in the getSyncMap

[11:02:22] <purplefox> and users can always edit cluster.xml if they want more replication for other maps

[11:02:23] <temporalfox> and agree that this special map used by vertx needs to be replicated

[11:02:42] <temporalfox> the __vertx.haInfo name

[11:02:46] <purplefox> we can just change the settings in default_cluster.xml

[11:02:53] <temporalfox> ok

[11:02:56] <temporalfox> I will try

[11:02:59] <purplefox> no need to check the name

[11:03:03] <temporalfox> cool

[11:03:19] <temporalfox> will do that and try to have a reproducer in HZ for this case

[11:03:33] <purplefox> actually.. we should make the naming consistent, right now we use _vertx.haInfo for cluster map but use “subs” for topic map

[11:03:44] <purplefox> we should probably user _vertx.subs for topic map

[11:03:48] <temporalfox> yes

[11:03:51] <temporalfox> speakinng of names

[11:04:13] <temporalfox> for internal event bus registration

[11:04:20] <temporalfox> like generated reply address

[11:04:28] <temporalfox> we could also use a prefix

[11:05:30] <temporalfox> anyway going to have a look at HZ project reproducer and config

[11:05:32] <purplefox> do you mean for easier debugging so you can recognise them?

[11:05:35] <temporalfox> yes

[11:05:38] <temporalfox> also

[11:05:40] <temporalfox> for metrics

[11:05:53] <purplefox> yes maybe, but this could give a performance penalty if you have many addresses

[11:06:00] <temporalfox> this allow to understand which metrics are business and which are not

[11:06:10] <temporalfox> ok

[11:06:15] <purplefox> as you will be scanning the first X chars every time (times millions) which are always the same

[11:06:20] <purplefox> to find matching address

[11:06:33] <purplefox> maybe we could enable this in “debug mode” ?

[11:06:37] <purplefox> (e.g. with system property)

[11:16:49] <purplefox> temporalfox: also.. in the users case, it also might be that the subs entry is on a different node, so he will need to make the subs map replicated or increase the backup size to guarantee it to work

[11:17:44] <purplefox> perhaps we should add a note in the hazelcast cluster manager - “if more than N nodes die at the same time, where N = backup count” then there is a possibility that topic information can be lost

[11:20:00] <purplefox> so…. really this is a user configuration issue - we can't guarantee the integrity of the event bus state with concurrent failure of more than N nodes where N = backup count

[11:20:04] <purplefox> default backup count = 2

[11:20:09] <purplefox> =1 I mean

[11:20:18] <purplefox> we should make this clearer though

[11:23:58] <temporalfox> so do we fix it with a replicatd map or do we just add a note ?

[11:25:46] <purplefox> do you know how to configure the map as replicated in default-cluster.xml ?

[11:26:03] <temporalfox> I havent looked yet

[11:26:12] <temporalfox> I was about to write a test for this particular case

[11:26:24] <temporalfox> in vertx cluster bus tests

[11:26:38] <temporalfox> but now I need to pick up my daughter at school :-)

[11:26:58] <purplefox> np. I think we could just make this a config issue and add something to the docs

[11:27:10] <purplefox> but I think we also need to make the naming consistent so the config covers both maps

[11:27:47] <purplefox> then we can just say “either map the map replicated, or increase backup counts if you want to cope with concurrent failure of more than N nodes”

[11:28:28] <purplefox> although increasing backup counts does not guarantee just reduces proability

[11:58:51] <cescoffier> pmlopes: Is Route#last(boolean) method gone, or just moved ?

[12:01:02] <s0va> purplefox: https://github.com/eclipse/vert.x/blob/fb7fd9624d6034a8a8be80515317998c92ec4f4e/src/main/java/io/vertx/core/eventbus/impl/EventBusImpl.java#L438 ⇒ i'm getting null pointer exceptions here sometimes

[12:03:04] <s0va> maybe there should be if (sid == null) { do something }

[12:04:13] <purplefox> s0va: you're probably trying to use an event bus instance before it's finished startng up, but without seeing your code it's hard to say

[12:11:22] <temporalfox> purplefox I'm not sure we can solve this by config because HazelcastInstance as separate methods for creating maps : getReplicatedMap and getMap

[12:16:43] <purplefox> temporalfox: it's not possible to configure in the config?

[12:22:27] <temporalfox> it's not the same API and the config.XML does not have replicated map config

[12:22:51] <temporalfox> replicate map returns a ReplicatedMap object and distributed map returns an IMap interface

[12:22:57] <temporalfox> both extends more or less java.util.Map

[12:28:34] <purplefox> ok i think it's ok just to use distributed map and make a note about backup count

[12:28:47] <purplefox> (and change the naming)

[12:29:31] <purplefox> looking at the docs it says that replicatedmap is weakly consistent and might get lost or missing updates which won't work for us anyway

[12:40:42] <pmlopes> @cescoffier it dropped the parameter it wasn't used and was giving the wrong semantics since it was being used to set a given route as the last entry of the router

[12:48:41] <purplefox> temporalfox: is the GenModule annotation no longer used? (vertx-sync doesn't build with it)

[12:58:23] <temporalfox> now it's ModuleGen

[12:58:31] <temporalfox> like VertxGen

[12:58:38] <temporalfox> it was changed a couple of weeks ago

[12:59:35] <temporalfox> purplefox ok I will do that and check the case is solved by increasing the backup count

[13:00:42] <purplefox> ok, i removed it from vertx-sync and it still seems to build without it

[13:06:14] <temporalfox> if you don't use codegen, you don't need it

[13:06:21] <cescoffier> pmlopes: thanks

[13:09:55] <temporalfox> purplefox also this has impact not only on event bus but also HA

[13:10:08] <purplefox> +1

[13:10:14] <temporalfox> when the failover node is chosen, an absence of data in the clusterMap

[13:10:20] <temporalfox> would prevent a node to be chosen for failover

[13:10:54] <temporalfox> and also affect quorum size

[13:11:03] <temporalfox> and have lower quorum size than expected

[13:12:34] <purplefox> I think it's just a distributed systems problem, not really Vert.x specific - if you have N replicas you can cope with at most N - 1 concurrent failures

[13:12:51] <purplefox> but we should document it

[13:13:01] <purplefox> the behaviour here is also specific to the cluster manager

[13:13:09] <purplefox> e.g. another cluster manager might not used a distributed map

[13:13:11] <temporalfox> yes

[13:19:53] <cescoffier> pmlopes: can you check the redis client changes: https://github.com/vert-x3/wiki/wiki/3.1.0---Breaking-changes

[13:20:21] <pmlopes> yes

[13:21:42] <cescoffier> thanks

[13:22:00] <cescoffier> looks like all method taking a list as parameters have been removed

[13:44:07] <cescoffier> purplefox, pmlopes, temporalfox - https://github.com/vert-x3/wiki/wiki/3.1.0---Breaking-changes contains all breaking changes from the official stack. Not much !

[14:08:45] <purplefox> cescoffier: thanks. in general we shouldn't make breaking changes outside major versions unless:

[14:08:53] <purplefox> a) There's a security issue that needs to be fixed

[14:09:01] <purplefox> b) Required by a bug fix

[14:09:31] <purplefox> this will be even more important in a Vert.x product where they are very strict about this

[14:09:54] <cescoffier> in core, 2 typos have been fixed, while the eventLoop method in something that is more or less internal

[14:10:52] <cescoffier> I've tried to use a new plugin to detect the changes, works quite well. Before I was using clirr, but clirr does not support Java 8 (because of BCEL that does not support java 8)

[14:11:25] <cescoffier> I will add the plugin configuration in our parent, so checking compatibility will be “simpler”

[14:15:32] <purplefox> cool, great job!

[14:34:53] <pmlopes> @cescoffier redis updates are in the wiki mostly should i also add the new methods? pretty much all the changes there was to keep the API consistent, for commands that take many args the method name is “command” + “Many” and just “command” when there is just 1 argument

[14:36:20] <pmlopes> this affected like 8 methods in the whole API

[14:45:03] <purplefox> pmlopes: I think that is probably ok for now

[14:45:14] <purplefox> pmlopes: but we should be more careful going forward

[14:48:17] <pmlopes> agree, the main issue was that redis 3.0.0 was not tested properly and lots of issues were found and fixed on 3.1.0 but i don't see any API changes in the near future, there is a big PR now to support redis sentinel but that will not break the API once is it fully reviewed

[17:07:04] * ChanServ sets mode: +o purplefox [17:37:44] * ChanServ sets mode: +o purplefox