My doubts concern the analysis of the results reached after the training of the model: when I test the network with a test set (of examples that don't belong to the training set), how should I interpret the AIC, BIC and log-likelihood scores? When are they consistent or good enough? Have they significance on their own or only compared with the scores of another model? And in the comparison with another model, how to choose the best one?
The AIC, BIC and LogLikelihood (LL) scores are criteria for selecting between a set of candidate models representing a data set. The LL specifies how well a model represents a data set and the LL can be increased by making the model more complex. So, this score should only be used to compare models with the same complexity. Both the BIC and AIC scores are based on the LL with a penalty score for complexity. The penalty score is different for BIC and AIC.
You should use these scores to select between candidate models as a representation of a data set. The higher the score, the better the model represents the data.
can I deduce something (good or bad) about my network?
No, not as far as I know. You can use it as a relative score to compare models and select a model with highest score.
Another parameter of the analysis is the ROC curve: which role does it play in the valuation of the goodness of the model?
The ROC can be used for assessing the performance of a classification model, i.e., your model should be used to assign a class label to a set of instances. The ROC (and the area under the ROC) is a measure of classification performance showing the True Positive rate as a function of the False Positive rate.
You can find some introductory material on these concepts on Wikipedia.